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Introduction 


ABOUT THE AUTHORS 


As friends and colleagues for many years, we’ve taught 
university non-science majors, retirees, clubs, Elderhostel 
classes, and various community groups about DNA, 
genes, and the human genome. We've enjoyed giving 
hands-on DNA workshops for K-12 teachers and middle 
and high school students, and we have even taught peo- 
ple how to isolate their own DNA from the cheek cells 
inside their mouths (safe, painless, cheap, and easy!). 

The invitation to write the new edition of Alcamo’s 
book brought a surge of anticipation and excitement, 
and years of work! We hope that the book reflects Ed 
Alcamo’s clear and fluent writing style about science. 
To bring the 3rd edition as up to date as is possible (for 
a printed book), we added several new chapters cover- 
ing advances in gene therapy, stem cells, drug design 
and development, bioinformatics, and animal and plant 
biotechnology. 


ABOUT OUR TARGET AUDIENCE 


We wrote DNA and Biotechnology for a wide audi- 
ence that includes college and high school students as 
well as laypeople with varying backgrounds in science. 
Our goal for the book is to help people to better under- 
stand how genes control cell function, how persistent 
and ingenious scientists tracked down the means of 
control, and how we as a society now use this knowl- 
edge to explore further, to heal, and to attempt to 
improve life. We tried to make this book very readable 
without oversimplifying the picture of a living cell, full 
of the thousands of molecular machines made from 
protein and RNA parts, reading the DNA genome, and 
performing the functions that make life possible. 


ABOUT LEARNING FEATURES IN THE BOOK 


The 3rd edition includes features designed to make 
learning about DNA much easier: 


Chapter Outline: Each chapter starts with a convenient 
outline that gives you a succinct overview of how 
the topics and subtopics interrelate. 


Hot Topic Box: Every chapter draws in readers with 
a recent, attention-grabbing news headline, brief 
story, and explanation of its relevance to the 
scientific information in the chapter. 

Looking Ahead: This section presents broad learning 
objectives that orient readers to fundamental goals 
for understanding and communicating about the 
chapter topics. 

Special Topic Boxes: These sections emphasize people 
and scientific discoveries that have a special 
connection or relevance to chapter topics. 

Boldface Terms: New terms are introduced using 
boldface type. All boldface terms are defined in the 
Glossary at the end of the book. 

Summary Statements: Short summary statements 
punctuate each chapter, helping readers to 
identify important points and orient themselves 
when reviewing the information. 

Summary: The summary at the end of each chapter 
brings together the key points and relates them to 
each other more immediately than the full chapter 
treatment allows. 

Review Questions: Ten broad-based review questions 
provide an opportunity for readers to test their 
recall and comprehension of the information in 
the chapter. 

Additional Reading: Recommendations include 
both current and historically relevant sources 
(books, newspapers, magazine and research 
articles, and web sites) that help readers delve 
further into topics of particular interest. 

Glossary: A glossary at the end of the book defines the 
boldface terms introduced in each chapter. 


ABOUT USING THE BOOK 


Organization of Chapters (3rd edition) 


Chapter 1: The Roots of DNA Research 
Chapter 2: The DNA Double Helix 
Chapter 3: DNA in Action 

Chapter 4: Tools of the DNA Trade 
Chapter 5: Working with DNA 


Chapter 6: Human Genomics 

Chapter 7: Bioinformatics 

Chapter 8: DNA Forensics 

Chapter 9: Exploring Cell Fate 

Chapter 10: Human Genetic Diseases 
Chapter 11: Gene Therapy 

Chapter 12: Stem Cell Research 
Chapter 13: Pharmaceutical Biotechnology 
Chapter 14: Animal Biotechnology 
Chapter 15: Agricultural Biotechnology 
Chapter 16: Genes and Race 


The 16 chapters in DNA and Biotechnology are 
arranged in three groups that give teachers the flex- 
ibility to select chapters based on the scientific back- 
ground of the students in the class. The first five 
chapters (Chapters 1-5), form a core of content that 
is essential for understanding the rest of the book. 
This core includes the basic structure and functions of 
DNA, RNA, and proteins in cells (Chapters 1 and 2), 
explains how gene expression controls cell function 
(Chapter 3), and describes the recombinant DNA clon- 
ing technologies that fundamentally changed DNA 
research (Chapters 4 and 5). 

Building on the foundation of the first five chapters, 
Chapters 6-10 provide an opportunity for readers who 
are curious about the role of DNA and genes in mod- 
ern research. Automated DNA sequence analysis has 
become routine, and hundreds of genome sequences 
have been analyzed, in addition to the entire human 
genome (Chapter 6). The deluge of primary sequences 
has fed the emerging bioinformatics field, which uses 
information technology to store, explore, and anno- 
tate DNA, RNA, and protein sequences (Chapter 7). 
DNA technology has enabled us to seek out and iden- 
tify specific DNA sequences in many contexts, with 
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applications in fields such as criminal forensics (fin- 
gerprinting) and medical diagnostics (Chapter 8). 
Molecular genetics research explains how multiple 
genome mutations can cause a cell to lose growth con- 
trol and turn into a cancer cell (Chapter 9). The chro- 
mosome locations of genes and mutations that cause 
many genetic diseases such as cystic fibrosis, sickle 
cell anemia, muscular dystrophy, Huntington’s disease, 
and many more, have been identified (Chapter 10). 
These chapters tie together the basic functions of genes 
and proteins in cells (Chapters 1-5) with the advances 
in DNA based technologies in the research lab, many 
of which harness the same molecules. The biological 
mechanisms employed by the cell to replicate DNA 
and make RNA have led to the development of the 
most important techniques used in molecular biology 
and genetics. 

Chapters 11-16 focus on several specialized appli- 
cations of DNA biotechnology. For example, finding a 
mutant gene that causes a genetic disease opens the 
door to the possibility of a gene therapy treatment 
(Chapter 11). The science of human embryonic stem 
cells is described, as are the very exciting iPS (induced 
pluripotent stem) cells, which are derived from adult 
human skin cells but look and act like embryonic stem 
cells (Chapter 12). New genetic strategies for design- 
ing drugs and the development of nanocarriers that 
deliver drugs directly into cells are just two examples 
of new areas of pharmaceutical research (Chapter 13). 
Advances include transgenic animals and plants genet- 
ically engineered to produce antibiotics, drugs, and 
hormones (Chapters 14 and 15). DNA research shows 
that individual human genomes are almost identical 
in DNA sequence and that all people have exactly the 
same genes. What does this mean about our under- 
standing of “race”? (Chapter 16). 
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Looking Ahead 
Introduction 
Developing a Theory of Inheritance 
When Counting Counts: Mendel’s Approach Yields 
the Basis of Modern Gene Theory 
Morgan’s Fruit Fly Experiments Reveal That Mendel’s 
Factors Are on Chromosomes 
Factors Become Genes, and DNA Is Discovered 
Relating DNA to Heredity 
A “Transforming Factor” Changes the Inherited 
Characteristics of Bacteria 
Using Viruses, Hershey and Chase Establish DNA as 
the Agent of Inheritance 
Summary 
Review 
Additional Reading 
Web Sites 


My Genome, Myself: Seeking Clues in DNA 


The New York Times, November 17, 2007 

By Amy Harmon 

The exploration of the human genome has long been rel- 
egated to elite scientists in research laboratories. But that is 
about to change. An infant industry is capitalizing on the plung- 
ing cost of genetic testing technology to offer any individual 
unprecedented—and unmediated—entree to their own DNA. 

For as little as $1,000 and a saliva sample, customers 
will be able to learn what is known so far about how the 
billions of bits in their biological code shape who they are. 
Three companies have already announced plans to market 
such services, one yesterday. 

Offered the chance to be among the early testers, | 
agreed, but not without reservations. What if I learned | was 
likely to die young? Or that | might have passed on a rogue 
gene to my daughter? And more pragmatically, what if an 
insurance company or an employer used such information 
against me in the future? 

But three weeks later, | was already somewhat addicted 
to the daily communion with my genes. (Recurring note to 
self: was this addiction genetic?) 

[To read on, go to http://tinyurl.com/2zuqsh.] 
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From the preceding article, we can see that we are at 
the start of an era of personalized genomes. How long 
will it be before, alongside tongue depressors, cotton 
balls, and blood pressure cuffs, a plastic card with your 
DNA chip becomes a routine part of a visit to the doc- 
tor’s office? The article represents the type of advances 
that 150 years of research has brought us, research that 
started with the studies discussed in this chapter. 

We will begin with the work of a monk in the 1850s 
whose different ideas and meticulous methods of 
investigation yielded a foundation from which the next 
two generations of scientists could grow, toward an 
understanding of heredity at the molecular level. And 
grow they did, developing the roots of what we now 
know to be DNA science. As you will see, researchers 
often had to struggle against preconceived notions of 
what could (and could not) be the biological material 
that transferred characteristics from one generation to 
the next. It took persistence to establish DNA, a mate- 
rial with only four components, as a carrier of infor- 
mation, when proteins, which have 20 components, 
were much more familiar and well understood. (In fact 
we can wonder about what preconceived notions the 
scientists and students of the future will discover con- 
cerning our generation). Here we will retrace the steps 
of these persistent, open-minded scientists who paved 
our way to DNA. 


LOOKING AHEAD 


DNA technology has its foundations in genetics, the 
science of heredity. It is appropriate, therefore, to open 
this book by exploring the insights and experiments 
that led scientists to recognize DNA as the hereditary 
substance. When you have completed the chapter, you 
should be able to do the following: 


e Understand the differences between prokaryotic 
and eukaryotic cells. 

e Recognize how the experiments of Gregor Mendel 
focused attention on cellular factors as the basis for 
inheritance. 
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e Understand the circumstances under which 
Mendel’s experiments were verified and how Sutton 
related Mendel’s “factors” to cellular units called 
chromosomes. 

e Show how Morgan related eye color in fruit flies to 
chromosomes. 

e Appreciate the origin of the term “gene” and describe 
how the gene concept emerged. 

e Recount Miescher’s work on nuclei, and concep- 
tualize how Feulgen and Mirsky contributed to the 
insight that genes are composed of DNA. 

e Understand the significance of Griffith’s experiments 
in bacterial transformation, and conceptualize how 
the transforming principle was identified as DNA. 

e Explain the seminal experiments of Hershey and 
Chase, and describe why their results pointed to DNA 
as the substance controlling protein and nucleic acid 
synthesis. 

e Increase your vocabulary of terms relating to DNA 
technology. 


INTRODUCTION 


In past centuries, it was customary to explain inherit- 
ance by saying, “it’s in the blood.” People believed that 
children received blood from their parents and that a 
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union of bloods led to the blending they saw in one’s 
characteristics. Such expressions as “blood relations,” 
“blood will tell,” and “bloodlines” reflect this belief. 

However, by the 1850s, scientists were question- 
ing the blood theory of inheritance. They could see quite 
clearly that semen contained no blood, and it was appar- 
ent that blood was not being transferred to the offspring. 
But if blood was not the hereditary substance, then 
what was? 

It was a long road to understanding that DNA medi- 
ates inheritance. By the end of the 1800s, the blood 
basis of heredity was challenged and eventually dis- 
carded. In its place, scientists developed an interest 
in nucleic acid molecules organized into functional 
units called genes. Scientists guessed that genes control 
heredity by specifying the production of proteins. But 
even the gene basis of heredity was hard to believe 
because the amount of nucleic acid in the cell seemed 
insignificant. 

The gene basis for heredity has become one of the 
foundation principles of biology. In the pages ahead, 
we will explore the development of the gene theory 
and note how interest grew in DNA as the substance of 
the gene. Long before scientists could apply the fruits 
of DNA research to modern technology, they had to 
learn what DNA was all about. “What purpose,” they 
asked, “does DNA serve in a living cell?” 


Box 1.1 Cell Geography Sets the Stage 


The term “cell,” in general, refers to a small room or compart- 
ment. The smallest compartment of an organism that is consid- 
ered to be alive is a cell, and so it is regarded as the fundamental 
unit of life. Cells are the natural environment for all the proc- 
esses discussed in this book. Getting the lay of the land, then, is 
important in understanding DNA and how it works. 

There are two kinds of biological cells: prokaryotic and 
eukaryotic. 

Prokaryotic cells (Figure 1.1A) contain one continu- 
ous space in which cellular materials are organized, but not 
separated by membranes. A cell wall surrounds all prokaryo- 
tic cells. Prokaryotes are usually single-celled organisms and 
include both bacteria and archaea. The archaea live in extreme 
environments (for example, boiling hot thermal vents, freezing 
cold arctic waters, and oil wells). 

In contrast, eukaryotic cells (Figure 1.1B) contain subcellu- 
lar compartments. A membrane surrounds each compartment, 
and there are several different types of compartments, called 
organelles. Eukaryotic cells are usually 10- to 100-fold larger 
than prokaryotic cells (though there are a few exceptionally 
large bacteria that defy this rule). Eukaryotes include both single- 
celled organisms (the majority) and multicellular organisms. Of 
the eukaryotes, only plant cells are surrounded by a cell wall, 
and its composition is very different than a prokaryotic cell wall. 


In eukaryotes, each type of cellular compartment is spe- 
cialized. The nucleus is home to the vast majority of the 
DNA, where is it complexed with proteins. The endoplasmic 
reticulum and the Golgi complex compartments are involved 
in protein synthesis and trafficking (that is, sending proteins to 
their correct destinations). Mitochondria possess a membrane 
specialized for energy production. Chloroplasts are centers 
for photosynthesis. 

A prokaryote’s DNA is tightly coiled with proteins in a 
region called the nucleoid. The nucleoid is not a compart- 
ment; it is just the DNA and proteins compacted together. 
In prokaryotes, some membrane-associated functions, such 
as energy production, are accomplished by the plasma 
membrane. 

The presence or absence of subcellular compartments 
leads to differences in how cellular events take place. 
Compartments allow for increased complexity and regula- 
tion. For example, in prokaryotes, RNA synthesis and pro- 
tein synthesis take place in the same compartment, so while 
RNA is being made from a DNA template, it can also be 
read nearly simultaneously to synthesize a protein. In 
eukaryotes, RNA synthesis occurs in the nucleus, and pro- 
tein synthesis occurs in the cytoplasm. The separation or 
uncoupling of these processes into compartments allows 


Chapter | 1 The Roots of DNA Research 


Box 1.1 Continued 
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FIGURE 1.1 


for intermediate steps to occur between them. For example, 
splicing of RNA in different patterns allows several different 
proteins to be produced from a single gene. This is one of 
the contributions to the complexity of eukaryotes as com- 
pared to prokaryotes. 


DEVELOPING A THEORY OF 
INHERITANCE 


When Counting Counts: Mendel’s Approach 
Yields the Basis of Modern Gene Theory 


In the mid-1800s (around the same time that scientists 
started questioning the blood theory of inheritance), 
a relatively obscure Austrian monk named Gregor 
Mendel (pictured in Figure 1.2) was conducting exper- 
iments to reveal the statistical pattern of inheritance. 
Mendel’s great contribution to science was the dis- 
covery of a predictable mechanism by which inher- 
ited characteristics move from parents to offspring. 
His work with plants laid the groundwork for intensive 
studies in genetics, a science that would blossom in 
the early part of the twentieth century. 

Mendel lived in a region that relied heavily on agri- 
culture, so it was not uncommon for educated individuals 
to have an interest in animal and plant breeding. Mendel 
had studied plant science at the University of Vienna, 
and he continued his interest in plants at the monastery 
at Brno (now a part of the Czech Republic). He began 
a series of experiments to learn more about the breed- 
ing patterns of pea plants. Peas were well suited for his 
work because they were easy to cultivate. Moreover, they 
had a short growing season, they could be fertilized arti- 
ficially, and they resisted interference by foreign pollen. 

Other important features of pea plants were their 
easily distinguished traits. Mendel observed, for exam- 
ple, that his garden had some pea plants with wrinkled 
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(A) A generalized prokaryotic cell. (B) A generalized eukaryotic cell. 


Features common to prokaryotic and eukaryotic cells 
include ribosomes, the molecular machines that synthesize 
proteins (although the exact makeup of the prokaryotic and 
eukaryotic ribosomes is different), the plasma membrane, and 
the watery interior environment, the cytoplasm. 


seeds and others with smooth seeds; some had green 
pods, and others had yellow pods; some had white 
flowers, and others had red flowers. Figure 1.3 shows 
this diversity. The more Mendel pondered the source 
of variations, the more his curiosity was aroused. He 
set out to determine how the variations originated and 
how the traits were passed to the next generation. 


A key ingredient in Mendel’s success was the plant he used 
to track inherited traits. The short generation time, obvious 
characteristics, and ease of breeding of pea plants pro- 
vided a fertile ground (so to speak) for his observations and 
experiments to flourish. 


Mendel studied pea plants by crossing plants hav- 
ing a certain characteristic with others having a con- 
trasting characteristic. He then studied how traits 
were expressed in the offspring plants. Mendel found, 
for example, that by breeding selected tall plants to 
selected short plants, he could obtain plants that were 
exclusively tall. The trait for shortness had apparently 
disappeared. But when he bred the tall plants from this 
first generation among themselves, some short plants 
reappeared in the next generation among the tall 
plants. These results were unexpected and perplexing. 

Mendels forté was mathematics. He carefully 
counted the plants displaying a particular characteristic 
and the plants having the contrasting characteristic (for 
example, tall plants and short plants); he discovered 


(A) (B) 


similar ratios of traits among the offspring. He noted, for 
example, that crossing the first generation’s tall plants 
among themselves always seemed to yield three tall 
plants for every short plant, as Figure 1.4 shows. (By that 
time, the monks in the monastery were noticing that peas 
had become a fairly regular item on the dinner table.) 

Many scientists of the 1850s believed that a single 
factor controlled a trait, but Mendel, reasoning that 
one of the factors was obtained from the male and one 
from the female, began with the assumption that each 
trait was controlled by two factors (although the nature 
of the factor was unknown). He guessed that the fac- 
tors express themselves in the offspring, but that one 
is dominant over the other. For example, the factor for 
tall plants dominates over the factor for short plants (it 
suppresses the short-plant factor, which is said to be 
recessive). The factors are then passed on to the next 
generation. Today we know Mendel’s factors as genes. 

From his work, Mendel developed a theory of 
inheritance completely at odds with the blood basis 
of heredity. Mendel’s results implied that sperm and 
egg cells, not blood cells, carry the factors of inherit- 
ance. Moreover, Mendel surmised that the factors are 
discrete units, not some vague, mysterious elements of 
the blood. Aware of the unconventional nature of his 
suppositions, Mendel avoided controversy by keeping 
his suppositions largely to himself. 

Mendel’s theory came to be known as the theory of 
transmissible factors. Although it was revolutionary for 
the times, Mendel did not stop there. For many years he 
investigated how one factor in the pair dominates the 
other factor and how a pair of factors separates during 
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FIGURE 1.2 Mendel and his pea plants. 
(A) Gregor Mendel (1822-1884), the 
Austrian monk who established the prin- 
ciples of genetics through meticulous 
experiments with pea plants. (B) Anatomy 
of the pea plant, showing the growth 
cycle and the reproductive features that 
make artificial pollination feasible. 


Ovules fertilized 


Fi rowth of 


seeds 


transmission to the next generation. He experimented 
up to the early 1860s and published his results in 1866 
in the Proceedings of the Society of Natural Sciences 
in Brno. Mendel included a detailed analysis of his 
theories in the publication, and he communicated his 
findings to other scientists of the times through a series 
of letters. In retrospect, Mendel’s observations are 
regarded as one of the great insights in science and the 
beginning of the discipline of genetics. 


Mendel’s assumptions were different from those of other 
scientists studying the same topic, so he was led to inter- 
pret his observations differently, developing the concept of 
transmissible factors we now know to be genes. 


Unfortunately, scientists of his time paid little atten- 
tion to Mendel’s work or its implications. One prob- 
able reason is that they had little understanding of 
biological chemistry. Another is that they failed to 
appreciate the significance of the cellular nucleus, 
the chromosomes, or the process of fertilization. Also, 
during the late 1800s, biologists were largely immersed 
in studying the theory of evolution, first promulgated 
in 1859 in Charles Darwin’s epic work On the Origin 
of Species. Research on inheritance and breeding was 
placed on the proverbial back burner as the biologi- 
cal, social, and economic implications of the theory 
of evolution continued to capture the attention and 
imagination of scientists and laypeople. Not until the 
year 1900 would interest in genetics once again come 
to the forefront of science. 
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Phenotype Dominant Trait Recessive Trait 
(1) Shape of seed: Round O Wrinkled 
round versus wrinkled ripe seeds ripe seeds 
(2) Color of pea: Yellow Green 
yellow versus green peas peas 
(3) Color of seed coat: Gray White 
gray versus white seed coat seed coats 
(4) Form of the ripe pod: Inflated Constricted 
inflated versus constricted ripe pods ripe pods 
between the seeds 
(5) Color of unripe pod: Green Yellow 
green versus yellow unripe pod unripe pods 
(6) Position of flower: Axial Terminal 
axial (distributed along flowers flowers 
main stem) versus 
terminal (bunched at 
the top of the stem) 
(7) Length of stem: Tall Short 
tall (from 6 to 7 feet) versus| plants plants 
short (from 1/4 to 1 feet) 


FIGURE 1.3 The traits of pea plants studied by Mendel. The dominant allele is on the left and the recessive allele is on the right. A descrip- 
tion of the phenotypes associated with specific alleles (traits) is provided. 


In the spring of 1900, three European botanists, 
working independently of each other, repeated and 
verified Mendel’s work. Each botanist cited Mendel’s 
article in his research, and each awakened the scien- 
tific community to the work of the pioneering monk. 
It was not so unusual that all three should be aware of 
Mendel’s work, but it was remarkable that the redis- 
covery of his theories was made almost simultane- 
ously by three investigators; indeed, the happenstance 
remains one of the unusual coincidences of scientific 
history. Within weeks, a wave of enthusiasm for inher- 
itance research sprang up. The discoveries made by 


Mendel had been forgotten for almost 40 years. Now 
they would change scientific thinking forever. 


Morgan’s Fruit Fly Experiments Reveal That 
Mendel’s Factors Are on Chromosomes 


During the first years of the twentieth century, Mendel’s 
experiments were carefully studied, and the belief 
emerged that Mendel’s factors were related to parts 
of the cell called chromosomes. Chromosomes (liter- 
ally “colored bodies”) are threadlike strands of chemi- 
cal material located in the cell nucleus. The threads 
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FIGURE 1.4 Mendel’s experiments with tall and short pea plants. 
Mendel bred purebred tall plants to purebred short plants (these consti- 
tute the P generation). He discovered that all the offspring plants were 
tall in the first filial (F1) generation. He then bred the tall plants of the 
F1 generation among themselves and found that short plants appeared 
as well as tall plants in the F2 generation. His meticulous calculations 
revealed that about 75% of the plants in the F2 generation were tall, 
and 25% were short. This 75% to 25% ratio was equivalent to 3:1. This 
led to his assumption that two “factors” for height exist in pea plants 
and suggested that one factor dominates over the other. 


of each chromosome consolidate and become clearly 
visible under the microscope (Figure 1.5) when a cell 
is dividing. With few exceptions, all human body cells 
have 46 chromosomes, and the 46 chromosomes are 
organized in 23 pairs. (Red blood cells have no chro- 
mosomes, and sperm and egg cells have only 23 chro- 
mosomes.) It is now known that chromosomes contain 
the DNA that carries the cell’s genetic message. 
Among the leaders in chromosome research at the turn 
of the century was the American biologist W. H. Sutton. 
In 1902, Sutton wrote that certain of Mendel’s rules 
of inheritance could be explained if Mendel’s factors 
were located on or in the chromosomes. Mendel had 
written, for instance, that inheritance factors occur 
in pairs, one member of the pair received from each 
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FIGURE 1.5 Color-enhanced human chromosomes seen under a 
microscope after separation from the nucleus. Chromosomes assume 
these compact shapes during cell division. With the notable excep- 
tions of reproductive cells and red blood cells, 46 chromosomes (23 
pairs) are present in each human cell. (A) Each individual chromo- 
some is outlined in white. (B) In a karyotype, chromosomes photo- 
graphed under the microscope are cut and pasted into an orderly 
display from largest to smallest, except for the sex chromosomes, 
which are placed at the end. Images and text: Copyright © 2009 by 
Photo Researchers, Inc. All rights reserved. 


parent. By 1900, cell biologists had established that 
chromosomes also occur in pairs, one chromosome 
derived from each parent. Moreover, Mendel theorized 
that during the production of sperm and egg cells, 
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FIGURE 1.6 Morgan and his fruit flies. (A) Thomas Hunt Morgan, whose experiments revealed that colorless white eyes in fruit flies are 
based on the presence of a single chromosome. His experiments related an inherited characteristic to a chromosome. (B) A fruit fly with red 


eyes. (C) A fruit fly with white eyes. 


the paired factors separate and move as units to each 
cell. Studies in cell biology showed that chromosomes 
behave similarly during cell reproduction. Sutton 
pointed out that chromosomes could be the hypotheti- 
cal inheritance factors Mendel thought responsible for 
heredity. Perhaps, he suggested, chromosomes and 
inheritance factors were identical. 

To demonstrate the validity of the chromosomal 
theory of inheritance, scientists had to relate at least 
one trait to a cell’s chromosome. But in the early 
1900s, the members of a chromosome pair could not 
be distinguished from each other visually. Thus, it was 
impossible to relate a single trait to a single chromo- 
some by sight alone. 

The problem was resolved in 1910 by Thomas 
Hunt Morgan of Columbia University (pictured in 
Figure 1.6). Morgan used the fruit fly Drosophila 
melanogaster in his work. By careful observations, he 
determined that one of the four pairs of chromosomes 
in the fruit fly determines its sex. This chromosome 
pair, he discovered, also determines colorless white 
eyes. Through an exhaustive series of genetic crosses 
and statistical analyses, Morgan determined that the 
male fruit fly inherits only one chromosome for sex 
determination. Thus, it must also inherit only one chro- 
mosome for white eye color. Therefore, white eye color 
must depend on a single chromosome. By providing 
statistical evidence for the relationship between sex 
and eye color in Drosophila, Morgan placed the chro- 
mosomal theory of inheritance on a firm footing and 
enhanced the role of the chromosome as the possible 
vehicle of inheritance. 


Morgan, like Mendel, made careful counts and analyzed 
them to arrive at firm conclusions about inheritance. By 
doing so, he was able to provide ample evidence support- 
ing Sutton’s explanation that inheritance factors resided on 
chromosomes. 


The next question was whether the whole chromo- 
some or a part of a chromosome is responsible for an 
inherited trait. Writing in 1903, Sutton proposed that 
merely a part of a chromosome is the basis for a trait 
because not enough chromosomes are possible to 
account for all an individual's traits. Sutton suggested that 
“the chromosome may be divisible into smaller entities.” 
Most other scientists agreed, and before long, the concept 
of the gene as the “smaller entity” gained prominence. 


Factors Become Genes, and DNA Is 
Discovered 


In the early 1900s, geneticists began using the terms 
“inheritance unit” and “genetic particle” to describe the 
factors occurring on the chromosomes of Mendel’s pea 
plants. By the 1920s, however, these terms had been 
discarded, and at the suggestion of the Scandinavian 
scientist Willard Johannsen, geneticists agreed to use 
the word “gene” instead. (“Gene” is derived from the 
Greek gennan, meaning “to produce.”) The term was 
originally used as part of Darwin’s word “pangenesis” 
to describe the theory that the whole body (including 
every atom and unit) “produces” itself over and over. 


In a 1910 article, Johannsen suggested using “gene” 
because it was completely free of connection with any 
hypothesis. The word was and continues to be a less 
cumbersome term than “inheritance unit.” 

Scientists of the 1920s viewed the gene as a specific 
and separate entity sitting on the cell’s chromosomes, 
but not strictly a part of a chromosome. Fortunately, they 
also reasoned that if genes were associated with chro- 
mosomes, a first step in learning the chemical nature of 
the gene would be to learn the chemical composition 
of the chromosomes; this is precisely what researchers 
attempted to do in the early 1900s. 

One possible chemical component of chromo- 
somes was a seemingly unique organic compound of 
the cell nucleus called nucleic acid. Nucleic acid was 
first described in 1869 by the Swiss researcher Johann 
Friedrich Miescher. With great difficulty, Miescher sep- 
arated nuclei from human white blood cells, and he 
searched for evidence of protein within these nuclei. 
Instead of protein, however, he found a substance unlike 
any class of chemicals then known. Miescher named the 
new substance “nuclein” (relating to its source). When 
he identified phosphorus in nuclein, he postulated 
that the substance was a storehouse for phosphorus in 
the cell. 

As Miescher continued his study of nuclein, he 
located it in yeast cells and kidney, liver, and testicu- 
lar cells. He also made the notable observation that 
nuclein was abundant in sperm cells obtained from a 
salmon. Some years later, chemists led by Phoebus 
Levene (Chapter 2) used this information in their studies 
and determined the components of nuclein. They gave 
it the more descriptive and technical name deoxyribo- 
nucleic acid (DNA). Coincidentally, Levene was born 
in 1869, the year of Miescher’s first report of nuclein. 


Looking for the material that makes up chromosomes, 
Miescher discovered in the nucleus an abundant, initially 
mysterious substance he dubbed nuclein. After further 
chemical characterization by others, it was named deoxy- 
ribonucleic acid (DNA). 


By the 1920s, it was clear that chromosomes had 
a role in heredity, and the isolation of DNA from cell 
nuclei made this organic substance a good candidate 
for the hereditary substance. Interest in DNA was fur- 
ther strengthened by a 1924 discovery attributed to 
the German biochemist Robert Feulgen, who observed 
that a dye (now called Feulgen stain) turns bright pur- 
ple when it reacts with DNA. (Figure 1.7 shows an 
example of this staining characteristic.) The dye could 
be used to locate cellular DNA and to determine the 
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FIGURE 1.7 Stained chromosomes. A photomicrograph of a plant 
cell stained with Feulgen stain to highlight the chromosomes of the 
cell nucleus. In this view, the chromosomes have replicated and are 
in the process of separating into two newly forming cells. 


TABLE 1.1 A comparison of the DNA content in the tissue 
cells and sperm cells of various animals 


Organism Tissue cells Sperm cells 
Cow 6.6 3.3 
Human 6.4 3.2 
Chicken 2.6 1.3 
Frog 15.0 7.5 


concentration of DNA at various times in the cell’s life 
cycle. Isolating DNA was a difficult chore at that time, 
but Feulgen’s dye technique allowed DNA research to 
leap ahead without the burden of complex chemical 
isolations. 

Another observation in that period helped forge the 
link between DNA and heredity. In the late 1920s, Alfred 
Mirsky and his coworkers at New York’s Rockefeller 
Institute reported that, with only two exceptions, all cells 
of an organism have virtually the same amount of DNA 
in their nuclei. The two exceptions are reproductive 
sperm and egg cells. These cells contain precisely half 
the amount of DNA found in nonreproductive cells such 
as muscle cells. Table 1.1 presents these data. Mirsky’s 
observation correlated with the theory that sex cells are 
the vehicle for bringing half the genetic information from 
each parent to the offspring. 

The Rockefeller group led by Mirsky also experi- 
mented with the zygote, the cell resulting from the 
union of sperm and egg cells. The researchers found 
that the zygote contains the same amount of DNA as 
other cells of the body. This observation reinforced the 
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notion that the DNA from two parents comes together 
during fertilization. In effect, it also verified the work of 
Mendel performed more than 60 years before because 
Mendel proposed that genetic factors separate when 
reproductive cells form, then come together in the 
offspring. The visual observations and chemical 
analyses of DNA led to the conclusion that the DNA 
was reduced by half during formation of the reproduc- 
tive cells, then reconstituted in the offspring zygote. 
Figure 1.8 shows the relationship of DNA to other 
aspects of the body. 


The relative amount of DNA in the nuclei of resting, divid- 
ing, and reproductive cells made DNA a good contender 
for the substance of chromosomes, but direct evidence that 
DNA was the hereditary material was lacking. 


RELATING DNA TO HEREDITY 


By the late 1920s, two concepts were evolving: genes 
are involved in heredity, and genes are composed of 
DNA. However, most evidence continued to be indi- 
rect and based on guesswork from laboratory deduc- 
tions. Scientists needed direct evidence linking DNA to 
the development of observable traits. Two series of stud- 
ies would provide that evidence. The first series was per- 
formed by Griffith, Alloway, Avery, and their colleagues. It 
drew a connection between DNA and the appearance of 
traits in bacteria. The second, performed by Hershey and 
Chase, linked DNA to the synthesis of protein. Both were 
landmark achievements in DNA technology; both con- 
nected enough dots to show that DNA is the substance of 
heredity. 


FIGURE 1.8 Relationship between DNA, genes, 
chromosomes, and cells in the human body. 


DNA packaged 
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A “Transforming Factor” Changes the 
Inherited Characteristics of Bacteria 


In 1928, a British medical officer named Frederick Griffith 
reported some puzzling results in his work with bacteria. 
Griffith was performing experiments with Streptococcus 
pneumoniae, a cause of bacterial pneumonia. The bac- 
terium, commonly known as the pneumococcus, occurs 
in two strains. One strain is designated S because it 
forms smooth colonies when growing on bacteriological 
medium; the other strain is designated R because it forms 
rough colonies. (Both forms are shown in Figure 1.9.) It is 
well known that S strain pneumococci are lethal to mice 
(as well as to humans), whereas R strain pneumococci 
are harmless. 

Griffith’s initial experiments produced the exp- 
ected results. He confirmed that the S strain pneu- 
mococci were deadly to mice and that the R strain 
bacteria were harmless. Then he performed some 
variations of these experiments. Griffith found that 
he could inject debris from dead S strain bacteria 
into mice without harming the animals. As before, 
these results were not surprising because the S strain 
bacteria were dead. However, what happened next 
was remarkable. 

Griffith mixed a sample of live R strain pneumo- 
cocci (harmless) together with debris from dead S strain 
pneumococci (also harmless because no living cells 
were present). Then he injected the mixture into the 
mice. By all expectations, the mice should have lived 
(both the S strain debris and the R strain bacteria were 
harmless). But pneumonia developed in the mice and 
they died, as Figure 1.10 displays. Why did the animals 
die? Griffith’s answer came when he performed an 
autopsy on the animals: their lungs were full of live S 
strain pneumococci, the deadly strain. Apparently, the 
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FIGURE 1.10 The transformation experiments performed by 
Griffith. (A) When live pathogenic pneumococci (S strain) were 
injected into mice, the animals died. (B) When live, harmless pneu- 
mococci (R strain) were injected, the animals remained healthy. 
(C) When debris from heat-killed S strain bacteria were injected 
into animals, they lived and were healthy. All these results were as 
anticipated. (D) However, when live, harmless bacteria (R strain) 
were mixed with cell debris of heat-killed S strain bacteria and the 
mixture was injected into animals, the animals died. On autopsy, 
Griffith found live, pathogenic bacteria (S strain); some of the live R 
strain bacteria had been transformed to live S strain bacteria. 


harmless R strain bacteria had changed into deadly S 
strain bacteria, which killed the animals. 

Griffith’s “biochemical magic” appeared to work 
each time he performed the experiment. Moreover, 
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FIGURE 1.9 A macroscopic 
view of colonies of the two 
types of bacteria used by Griffith 
in his experiments in transfor- 
mation. The bacteria are grow- 
ing on the surface of a nutrient 
medium in a Petri dish and can 
be distinguished by their appear- 
ance (among other means). 
(A) The harmless R (rough) strain 
colonies of Streptococcus pneu- 
moniae. (B) The disease-causing 
S (smooth) strain colonies. 


other researchers were able to confirm his findings 
shortly thereafter. Griffith postulated that something 
in the S strain chemical debris (a protein, he believed) 
was entering the R strain bacteria and “transforming” 
them by changing their biochemistry. Unfortunately, 
he was unable to identify the transforming substance. 
Nor would he live to see the significance of his work. 
Griffith died during the German air attacks on London 
in 1941. 


Mixing living, harmless bacteria with dead, formerly disease- 
causing bacteria led to a “transformed” strain of living bac- 
teria that caused disease. The “factor” that transformed the 
bacteria from harmless to harmful remained unknown. 


Griffith's work did not go unnoticed by microbial 
geneticists. In 1933, James Lionel Alloway and his 
group at Rockefeller Institute successfully purified cell 
debris from S strain pneumococci and produced a 
cell-free extract. Then they used the extract to trans- 
form R strain bacteria to S strain bacteria. Moreover, 
they performed the transformation in test tubes, a pro- 
cedure freeing them from the rigors of working with 
animals. Alloway noted that the transforming principle 
could be extracted from the mixture with alcohol and 
that the chemical appeared as “a thick, stringy pre- 
cipitate ... which slowly settled out on standing.” From 
the description, modern biochemists recognize the 
transforming principle as DNA. At the time, however, 
Alloway was inclined to believe the substance was 
made of protein. 

In 1935, Oswald T. Avery (Figure 1.11) and his asso- 
ciates, Colin MacLeod and Maclyn McCarty, began a 
series of exhaustive chemical analyses to identify the 
transforming substance. Beginning with crude extracts 
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FIGURE 1.11 Oswald Avery, the Rockefeller Institute researcher 
who led the effort to identify DNA as the transforming principle. 
Avery’s success focused attention on DNA as the chemical material 
of heredity. 


from bacteria, they used a process of elimination to 
determine the nature of the chemical substance by 
finding out what it was not. As the months and years 
unfolded, they dismissed proteins, fats, and carbohy- 
drates as possible candidates for the transforming sub- 
stance. It appeared that the only possible thing left was 
nucleic acid. Finally, the search narrowed down to one 
nucleic acid: deoxyribonucleic acid (DNA). In a semi- 
nal article published in 1944, Avery and his colleagues 
presented evidence that DNA is the transforming princi- 
ple (and opened the way to modern DNA technology). 
The researchers were not bold enough to claim that 
DNA was the hereditary material, but the implication 
was clear: DNA was apparently able to transform bac- 
teria so dramatically that a harmless strain changed to a 
deadly strain. 

Most scientists were blind to Avery’s discovery and 
were reluctant to accept DNA as the hereditary sub- 
stance. The majority of geneticists of the 1940s were 
not trained in biochemistry, and Avery’s experiments 
were difficult for some to repeat. In addition, many 
scientists were hard pressed to believe that results 
obtained from bacteria could be applied to more 
complex organisms such as humans. Moreover, the 
results were published in the Journal of Experimental 
Medicine, a publication that the geneticists and bac- 
teriologists of that period usually did not read; and the 
preoccupation with World War Il had restricted the 
dissemination and flow of scientific knowledge while 
limiting funds for scientific research. Therefore, the 
impact of Avery’s finding was lost. 

Nevertheless, Avery’s group continued their efforts 
to prove that DNA was more than an inert chemical 
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of the cell nucleus. With great difficulty they isolated 
a quantity of DNase from beef pancreas cells. DNase 
is a biological catalyst, an enzyme that destroys DNA 
but has no effect on other molecules. The investigators 
mixed their transforming substance with DNase and 
noted that the mixture lost its ability to transform bac- 
teria. Still, many were not convinced. 

Opposition was also voiced from the “protein sup- 
porters.” These scientists believed that protein was the 
genetic material because protein appeared to have the 
necessary complexity to encode all the biochemical 
information in the cell’s nucleus. Proteins are chains 
of 20 different and relatively simple organic molecules 
called amino acids (Chapter 2). A protein can be made 
of 10 amino acids or 1000 amino acids or 10,000 
amino acids. The crucial element of any protein is the 
sequence of amino acids—that is, which amino acid 
follows which in the chain. This sequence of amino 
acids is as important to proteins as is the sequence of 
letters in a word (“ton” has a much different meaning 
than “not”). Protein supporters were of the opinion 
that the amino acid sequence in one protein serves as 
a model for constructing a new protein. Their outlook 
would be dealt a severe blow by the 1952 experiments 
of Hershey and Chase. 


Despite experimentation that eliminated all other known 
candidates as the transforming factor, there was difficulty 
in accepting the idea that DNA could be the agent of trans- 
formation and hence biological inheritance. 


Using Viruses, Hershey and Chase Establish 
DNA as the Agent of Inheritance 


Alfred Hershey and Martha Chase (Figure 1.12) worked 
at the Cold Spring Harbor Laboratory in New York. 
The pair studied bacteria and the viruses that multiply 
within those bacteria (Figure 1.13). In 1952, scientists 
knew that certain viruses use bacteria as chemical fac- 
tories for producing new viruses; however, the actual 
mechanism was uncertain. Biochemists were also 
aware that bacterial viruses are composed of a core 
of DNA enshrouded in a protein coat. What they did 
not know was whether the nucleic acid or the protein 
(or both) directs replication of the virus. Hershey and 
Chase would answer that question and in so doing, 
they would establish the essential role played by DNA 
in cellular biochemistry and inheritance. 

Hershey and Chase made use of the observation that 
viral DNA contains phosphorus (P) but no sulfur (S). By 
contrast, the outer protein coat of the virus has sulfur 
(S) but no phosphorus (P). In their first experiments, 


12 


(A) 


DNA and Biotechnology 


(B) 


FIGURE 1.12 (A) Alfred Hershey and (B) Martha Chase performed the key experiments demonstrating that DNA is responsible for directing 


the synthesis of viral proteins in a host cell. 


(A) 


FIGURE 1.13 Bacterial viruses. (A) Colored transmission electron microscopic view of viruses that bind to bacterial cells and replicate within them, 
thereby producing new viral particles. Viruses such as these are called bacteriophages (“bacteria-eaters”). Viruses consist of little more than a fragment 
of nucleic acid such as DNA (red) enclosed in a protein coat (orange). These viruses are using their tails as syringes to inject their viral DNA into an E. coli 
cell (blue). (B) Several viruses attacking E. coli. The viruses attach by their “tails” to the surface of the bacterium. The protein coat remains outside the bac- 
terium while the DNA enters the bacterium to direct the replication of viruses. Images and text: Copyright © 2009 by Photo Researchers, Inc. All rights 


reserved. 


Hershey and Chase cultivated viruses with the radio- 
active forms of phosphorus (37P) and sulfur (°S). They 
successfully prepared viruses whose nucleic acid was 
radioactive with 32P and whose protein was radioactive 
with 7°S. 


Now came the seminal experiments. Hershey and 
Chase mixed the radioactive viruses with a popula- 
tion of bacteria. Then they waited just long enough for 
viral replication to begin. At this point, they used an 
ordinary household blender (now a museum piece) to 
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FIGURE 1.14 The Hershey-Chase experiment with viruses (bacteriophages) and bacteria. (A) Viruses are prepared with two radioactive 
labels, one in the coat protein (35S) and one in the DNA (??P). (B) The viruses are combined with host bacterial cells and given an opportunity 
to interact. (C) Then the mixture is agitated in a blender, and the empty viral protein coats are removed, and are found to contain the radio- 
active °S. (D) The DNA carrying the radioactive 3*P is found within the host cell. (E) The phage replicate and (F) Assemble into new phage. 
(G) The new phage are then released from the cell. The results indicate that the phage DNA directs the synthesis of both the DNA and the 


protein added to make the new phage. 


shear away any viruses and debris clinging to the bac- 
terial surface. 

Then the analysis began. Hershey and Chase tested 
the bacteria and surrounding fluid to find out where the 
radioactivity was. This would enable them to develop 
a biochemical glimpse of viral replication. After experi- 
mentally bursting the bacteria, the researchers found 
most of the 3?P within the contents of the bacterial 
cytoplasm. This finding indicated that viral DNA was 
entering the bacteria. Then they discovered that the *°S 
was largely in the sheared-away remains of the viruses 
and in the surrounding fluid. This observation indicated 
that the protein part of the viruses was remaining out- 
side the bacteria. The results led Hershey and Chase to 
the inescapable conclusion that viral DNA enters the 
bacterium, whereas the viral protein remains outside. 
(Figure 1.14 shows the experiment.) Thus, DNA was the 
sole element responsible for viral replication. Protein 
had no role in the process. 

Certain experiments stand out as turning points in 
scientific history, and the experiments performed by 
Hershey and Chase are one such turning point. In retro- 
spect, we can see how their results had substantial impact 
on the thinking of that era. Hershey and Chase clarified 
the important aspect of viral replication that nucleic acid 
goes inside the cell, whereas the protein coat remains 
outside. In broader terms, their results strengthened the 
place of DNA in cellular biochemistry. Bacterial viruses, 
it should be remembered, are composed solely of nucleic 
acid and protein, and the Hershey-Chase experiments 


reinforced the concept that DNA, and not protein, dic- 
tates the synthesis of both nucleic acid and protein. 


It was becoming clear that all the biochemical information 
for the synthesis of both nucleic acid and protein is stored 
in the DNA. Avery’s experiments performed eight years 
earlier had linked DNA to the genetic material. The work 
of Hershey and Chase considerably strengthened that link. 


Scientists are usually reluctant to run to a window and 
shout their discoveries to the world, and Hershey and 
Chase were no exceptions. Instead, they wrote soberly in 
a 1952 scientific journal, “This protein probably has no 
function in the growth of intracellular phage. The DNA 
has some function.” Their caution was shared by other 
biochemists who were equally hesitant to apply processes 
learned from viral replication to human cell functions. 
Indeed, late in 1952 James D. Watson, the co-discoverer 
of DNAs structure (Chapter 2), read a telling letter from 
Hershey to a scientific group at Oxford, England. Said 
Hershey, “My own guess is that DNA will not prove to 
be a unique determiner of genetic specificity.” Scientific 
history has proven otherwise—considerably otherwise. 


SUMMARY 


The roots of DNA research can be traced back to the 
innovative experiments of Gregor Mendel conducted 
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in the mid-1800s. Mendel postulated that an inher- 
ited trait is controlled not by blood, but by “factors” 
obtained from the parents. His work with pea plants 
implied that one factor is obtained from each parent 
and that a particular factor may dominate the other. It 
also appeared that the factors separate during transmis- 
sion to the next generation. However, at the time, little 
attention was paid to the work. 

At the beginning of the 1900s, Mendel’s experiments 
were repeated and verified, and scientists postulated 
that his “factors” were really chromosomes. Morgan’s 
work of 1910 showed that white eye color in fruit flies 
is determined by a single chromosome, and he postu- 
lated that a single chromosome determines a trait. But 
because individual chromosomes could not explain all 
traits, molecular geneticists came to believe that entities 
on the chromosome called “genes” were the hereditary 
factors. Evidence presented by Miescher, Feulgen, and 
Mirsky indicated that chromosomes are composed of 
deoxyribonucleic acid, or DNA. 

Experiments performed by other biologists and 
chemists strengthened the link between genes and DNA. 
In 1928, for example, Griffith showed that the charac- 
teristics of certain bacteria could be changed (i.e., the 
bacteria could be transformed) if they were mixed with 
debris from another type of bacterium. Alloway and his 
group found they could purify the debris to increase its 
potential to transform bacteria, and Avery and his col- 
leagues discovered that the transforming material in the 
debris was DNA. 

The experiments of Hershey and Chase provided 
the final proof for DNA’s involvement in heredity. 
Hershey and Chase performed experiments with bacte- 
ria and the viruses that replicate within them. Bacterial 
viruses are composed primarily of protein and DNA. 
The experimental results showed that DNA alone 
directs the replication of new viruses—that is, DNA 
contains the biochemical information for the synthesis 
of both the viral protein and the viral DNA. The results 
obtained by Hershey and Chase confirmed that DNA 
is the hereditary material and stimulated additional 
interest in the study of molecular genetics. 


REVIEW 


This chapter has recounted the process by which DNA 
was identified as the substance of heredity. To test your 
comprehension of the chapter's major ideas, answer 
the following review questions: 


1. What was Mendel’s great contribution to science, 
and how did his work lay the groundwork for 
studies in genetics? 
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2. Draw a picture or diagram that contains all of the 
following, and label each: cell, chromosomes, 
gene, DNA, nucleus, and cytoplasm. Is the cell 
eukaryotic or prokaryotic? Label it and state why. 
Learn to draw such a picture without looking at 
the textbook or other source of information. 

3. How did Morgan go about demonstrating that a 
trait can be associated with a chromosome? 

4. How was a gene viewed in the 1920s? 

5. Describe the contributions of Johann Miescher to 
the study of molecular genetics. 

6. What observations made by Feulgen and 
Mirsky helped forge a link between DNA and 
heredity? 

7. Describe the experiments performed by Frederick 
Griffith, and explain why they were significant to 
the development of genetics. 

8. Identify the accomplishment of Avery and his col- 
laborators, and indicate why their work did not 
receive acclaim. 

9. Explain the experiments performed by Hershey 
and Chase, indicating what the “tools” of the 
experiment were and what the results were. 

10. Summarize how the Hershey-Chase experiment 
was a deciding factor in linking DNA to protein 
and DNA synthesis. 
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Nature: News and Views, January 30, 2008 

By John C. Crocker 

Three-dimensional nanoparticle arrays are likely to be 
the foundation of future optical and electronic materials. 
A promising way to assemble them is through the transient 
pairings of complementary DNA strands. 
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One of the staple concepts of nanotechnology is that 
of “growing” useful materials or devices by coaxing a 
random mixture of microscopic parts to assemble spon- 
taneously into a desired structure. Versatile self-assembly 
schemes have been demonstrated that use DNA as the pri- 
mary building material.... Two research teams have built 
on the successes with DNA to aid the self-assembly of gold 
nanoparticles. Their technique should also work for other 
varieties of technologically exciting nanoparticles. 

Progress in achieving the directed self-assembly of 
nanoparticles had been elusive, owing to one potentially 
daunting requirement: selective adhesion. Each micro- 
scopic part must be engineered so that it sticks only to the 
others it should abut in the desired final structure... 

This is where DNA comes into its own. Particles carrying 
complementary strands of DNA selectively adhere to each 
other when the strands “hybridize” to form the familiar DNA 
double helix. The final architecture is thus determined not 
by chemistry or charge, but by the lengths and nucleotide 
sequences of the DNA strands. That promises a versatile 
assembly scheme that might be used with particles of nearly 
any material to fabricate nanocomposites or “metamateri- 
als” with unusual electronic and optical properties. The 
applications of such materials might include high-efficiency 
solar panels and lasers, super-resolution microscopes—and 
even coatings to render objects invisible. 


More than 50 years after the discovery of the DNA 
double helix, our knowledge of the structure of 
DNA continues to pay off in ways that even Watson 
and Crick could not have dreamed of in 1953. Now, 
almost a decade into the twenty-first century, scientists 
worldwide are using the unique properties of DNA 
structure, including its ability to store information in 
the double helix, for new and amazing applications in 
science and biomedical research. DNA is being used 
to assemble individual atoms into designer molecules 
using nanotechnology. Scientists are learning how to 
assemble molecules with some amazing properties, 
from fibers that are hundreds of times as strong as steel 
yet weigh one-sixth as much, to nanofactories that 
can assemble nearly anything, from a new iPod to the 
clothes you wear, starting with individual atoms. 
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Nanomaterials: Golden Handshake describes how 
the structure of DNA is already playing a novel role in 
the development of these futuristic materials. In this 
chapter, we'll explore how the structure of DNA was 
determined, a puzzle that was solved only by interpret- 
ing and integrating data from several different scientific 
fields. This historic achievement was accomplished by 
two scientists who juggled the scientific puzzle pieces 
in their minds (and with cardboard cutouts), but who 
did not actually perform a single hands-on experiment 
with DNA. 


LOOKING AHEAD 


Determining the structure of DNA was one of the 
major scientific achievements of the twentieth cen- 
tury. Knowing the structure of DNA gave scientists 
insight into how heredity works and made the revolu- 
tion in molecular biology and DNA technology possi- 
ble. Moreover, the structure of DNA had an enormous 
impact on our understanding of gene function and 
DNA replication in cells. On completing this chapter, 
you should be able to do the following: 


e Recognize the three fundamental building blocks of 
nucleotides used to assemble DNA. 

e Describe how sugar and phosphate groups link 
to each other to form the “backbone” of the DNA 
molecule. 

e Summarize Erwin Chargaff’s findings, and indicate 
why they were important in solving the puzzle of 
DNA structure. 

e Discuss the different contributions of Franklin, 
Wilkins, Watson, and Crick in determining the 
structure of DNA. 

e Explain what is meant by semiconservative DNA 
replication. 

e Describe the important functional characteristics of 
the DNA polymerase enzymes involved in duplicat- 
ing genome DNA. 

e Use your newly acquired DNA vocabulary to read 
with understanding about DNA-related topics online 
(google “DNA’), and expand your confidence in 
learning about DNA. 


INTRODUCTION 


By the 1950s, it was becoming increasingly clear to 
the scientific community that the deoxyribonucleic 
acid (DNA) molecule is the basis of genetic heredity. 
It is hard to believe these days, but at the time, very 
little was known about DNA structure. Scientists real- 
ized that they needed to know the molecular structure 
of DNA because the structure of the DNA molecule 
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might shed light on the hereditary process; also, 
understanding DNA structure might explain how the 
molecule duplicates during cell reproduction. The 
processes of genetic heredity and cellular reproduction 
are among the most fundamental and important events 
in biology, and the quest for this knowledge started 
a race to figure out what a DNA molecule actually 
looks like. 

During the 1940s, top scientists worldwide were 
studying the chemical characteristics of DNA, work 
that was a critical step to prepare for understanding 
the three-dimensional structures of biological mac- 
romolecules like proteins and DNA. In fact, world- 
famous scientist Linus Pauling, then a professor at the 
California Institute of Technology, used x-ray crystal- 
lography to solve the structures of large proteins in a 
series of cutting-edge papers published in 1951. At 
that time, the race was on among scientists to deter- 
mine the structure of DNA based on what was known 
about its chemical and physical characteristics. The 
race included researchers James Watson and Francis 
Crick, who determined the structure of DNA and in 
so doing not only gained international fame but also 
opened a door to the molecular investigation of hered- 
ity. As this chapter will show, the work of Watson and 
Crick was the jumping-off point for the science behind 
DNA and biotechnology. 


THE STRUCTURE OF DNA 


Establishing the structure of DNA was one of the major 
achievements of the twentieth century. Not only did 
it yield myriad practical benefits, but it also gave sci- 
entists the philosophical pride of understanding how 
heredity works. Biology has many bedrock principles— 
for example, the cellular basis of living things, the 
germ theory of disease, and the process of evolu- 
tion are three—and the chemical basis of heredity is 
another. Unlocking the secret of DNA was the key to 
understanding this principle. 


DNA is Constructed from Nucleotide Units 


Although the structure of DNA was unknown in the 
1940s, the basic chemical components of DNA had 
been studied for two decades. In the 1920s, Phoebus 
A. T. Levene determined the chemistry of nucleic acids. 
Working with his colleagues at Rockefeller Institute 
in New York City, Levene studied two types of nucleic 
acid—ribonucleic acid (RNA) and deoxyribonucleic acid 
(DNA)—isolated from yeast cells and animal thymus tis- 
sue. Levene’s analyses revealed three fundamental com- 
ponents in both types of nucleic acids: (1) a five-carbon 
sugar, which could be either ribose (in RNA) or deoxyri- 
bose (in DNA); (2) phosphate, a chemical group derived 
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Molecular: space-filling 


FIGURE 2.1 Research revealed the fundamental components of nucleic acids. Research in the 1940s by Phoebus Levene and colleagues 
revealed the three fundamental components of both types of nucleic acids: phosphate, that is, a chemical group derived from phosphoric 
acid molecules, a five-carbon sugar, which could be either ribose (in RNA, not shown) or deoxyribose (in DNA), and four different com- 
pounds containing nitrogen and having the chemical properties of bases (A, G, C, and T). 


from phosphoric acid molecules; and (3) four different 
compounds containing nitrogen (Figure 2.1). 

Because of their nitrogen content and basic quali- 
ties, the four nitrogenous compounds are simply 
referred to as bases. In DNA, the four most common 
bases are adenine (A), thymine (T), guanine (G), and 
cytosine (C). RNA, the other important nucleic acid 
in cells, contains the A, G, and C bases, but a base 
called uracil (U) replaces thymine (T). The adenine 
(A) and guanine (G) bases are double-ring molecules 
called purines, whereas the cytosine (C), thymine (T), 
and uracil (U) bases are single-ring molecules called 
pyrimidines. 


Levene concluded that DNA is composed of three 
essential components that form units, which are in 
turn strung together to form a long DNA chain. In 
contemporary biochemical terms, the units are called 
nucleotides. In DNA, each nucleotide consists of a 
deoxyribose sugar attached to a phosphate group and to 


Units called nucleotides are the basic building blocks of 
DNA and RNA. A nucleotide consists of a base, a sugar, 
and a phosphate group. The identity of the base is the 
only feature that distinguishes one DNA nucleotide from 
another, or one RNA nucleotide from another. 
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Adenosine monophosphate 
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FIGURE 2.2 Synthesis of the DNA building block unit: a nucleotide. A nucleotide is composed of a phosphate group, a deoxyribose mol- 
ecule, and an adenine (A) molecule. The shaded -OH and -H groups are removed during the synthesis of the nucleotide. This nucleotide is 


called adenosine monophosphate. 


a base (Figure 2.2). Each of the four nucleotides differs 
from the other three only in its base component. 


DNA Nomenclature Helps to Understand 
DNA Function 


Chemists have developed a nomenclature system to 
identify the molecular structures of many thousands 
of chemical compounds, including the components of 
DNA. The casual student of DNA can perhaps skim over 
these details, but beware that many of the modern terms 
used routinely to discuss DNA refer to a specific feature 
of the DNA molecule that is important in a practical 
sense. For example, the two ends of a DNA strand are 
not identical. In laboratory experiments, proteins that 
respond differently to each DNA end are used routinely, 
so understanding the terminology that refers to each end 
can be critical for success. Thus, students who plan to 
pursue further studies in DNA technology must become 
familiar with fundamental DNA facts and terminology. 
Using standard chemical nomenclature, num- 
bers are assigned to the ring atoms of the base and 
sugar (Figure 2.3). When specifying where a chemi- 
cal group is attached to a molecule, it is customary 
to refer to the number assigned to that specific car- 
bon atom (Figure 2.3). The carbon atoms in deoxyri- 
bose are numbered 1’ to 5’ (pronounced “one-prime” 
and “five-prime”) with the numbering system start- 
ing to the right of the oxygen atom and proceeding 
clockwise around the molecule. The phosphate group 
attached to the 5’ carbon atom of the deoxyribose 
establishes the 5’ end of the DNA strand. The func- 
tionally important -OH (hydroxyl group) is attached 
at the 3’ carbon atom of the deoxyribose molecule. 
This 3’ -OH group is required for the addition of 
nucleotide units during DNA synthesis (as discussed 
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FIGURE 2.3 Numbering system in a nucleotide unit of DNA. The 
ribose molecule’s carbons are numbered starting to the right of the 
oxygen atom and the numbering proceeds clockwise. A prime (’) is 
placed next to each number of the sugar, whereas the base compo- 
nent is numbered without primes, thereby distinguishing the atoms 
of the sugar from those of the base of a nucleotide. When construct- 
ing a DNA strand, another nucleotide unit will attach at the 3’ end. 


later in this chapter). The 3’ and 5’ carbons, as we'll 
see, are important markers for distinguishing the 
chemical direction (polarity) of a DNA strand (Figure 
2.4). The bases are attached to the sugar through the 
1’ carbon. Note that the atom numbers in the bases 
do not use “prime” and thus are distinguished from 
the carbons in deoxyribose. 


The Two Ends of a Single DNA Strand are 
Chemically Distinct 


Because the 3’ end of the DNA strand contains a 
hydroxyl group (-OH) and the 5’ end of the DNA has 
a phosphate group (P), the DNA strand has polarity 
(Figure 2.4). In the DNA helix, the two DNA strands 
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FIGURE 2.4 Nucleotide units are connected to form a single strand of DNA. The phosphate group forms a molecular bridge between the 
5’ carbon atom of one nucleotide and the 3’ carbon of the next nucleotide. This type of chemical bridge is called a phosphodiester chemi- 
cal bond, and it links together the sugars and phosphates into the backbone of the DNA molecule. A single strand of DNA has two different 
ends: the 3’ end of the DNA molecule, which has an -OH group that is available for adding more nucleotides onto the DNA strand, and the 
5’ end, which has a phosphate group. DNA strands grow at the 3’ end, but not at the 5’ end. 


are oriented in opposite directions and are said to be 
antiparallel. The chemical structures at the ends of DNA 
strands in both single-stranded and double-stranded 
DNA enable the DNA ends to engage in important 
processes in the cell. The chemical characteristics of 
the ends of the DNA strands are particularly important 
when the DNA is manipulated in the lab. For example, 
many enzymes can act only on the 3’ or 5’ end of the 
DNA strand. 


The atom numbers of nucleotides allow us to easily and 
unambiguously refer to important landmarks in the DNA 
molecule. A case in point is the two ends of a DNA strand, 
referred to as 5’ and 3’, which have different properties that 
are vital to understanding how DNA works in a cell. 


Special multiprotein enzymes in the nucleus of the 
cell synthesize DNA by linking nucleotides to each 


other; the 3’ carbon of one nucleotide is joined to the 
phosphate group of a second nucleotide. Note that 
because the “prime” notation follows the 3, we must 
be referring to carbon 3 on the sugar, not the base. 
As Figure 2.4 shows, the phosphate group essentially 
forms a bridge connecting the two deoxyribose mol- 
ecules. To continue building the DNA molecule, link- 
ages to other nucleotides are forged in the same way, 
building up a strand of tens or thousands (or millions, 
or any number) of nucleotides linked to one another 
in the DNA molecule. By agreement among scientists, 
the sequence of the bases in a DNA molecule is read 
in the 5’ to 3’ direction, starting at the 5’ end of the 
DNA strand and reading toward the 3’ end. 


DNA molecules are synthesized in the 5’ to 3’ direction. 
By convention, nucleotide sequences are written and read 
from 5’ to 3’ as well. 
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Levene and Chargaff Provided Chemical 
Clues to the Structure of DNA 


Levene’s studies in the 1920s indicated that all four 
nitrogenous bases, A, G, C, and T, were present in 
virtually the same amounts in DNA. Later proven false, 
at the time this conclusion encouraged the belief that 
DNA is simply a polymer of repeating nucleotide units 
(e.g., TGACTGACT GAC; thymine-guanine-adenine-cyto- 
sine-thymine-guanine-adenine-etc.). Without sequence 
variation in a repeating chain, it was difficult to see how 
DNA could provide sufficient diversity to carry any bio- 
chemical or hereditary information. This was one reason 
that Avery's identification of DNA as the transforming 
substance in bacteria did not receive immediate accept- 
ance (see Chapter 1). At the time it appeared that DNA 
was nothing more than a structural component of chro- 
mosomes. However, after World War II, Levene’s chem- 
ical analyses of the amounts of each base in DNA were 
repeated with more sophisticated equipment and with 
very different experimental results. His tests indicated 
that the four nitrogenous bases were present in unequal 
amounts in DNA. 

Erwin Chargaff’s experiments, reported in 1949, 
indicated that regardless of the source of the DNA, the 
amounts of adenine (A) and thymine (T) were similar 
to each other, and the same was true for the amounts 
of cytosine (C) and guanine (G) (Table 2.1). It appeared 
that for every adenine molecule there was a thymine 
molecule (and vice versa), and for every cytosine mol- 
ecule there was a guanine molecule (and vice versa). 
These observations suggested that DNA is not a simple 
repeating polymer. Moreover, if the amounts of each 
base vary in an organism’s chromosomes, perhaps 
DNA might have the properties necessary to code for 
information. And if different organisms have different 
amounts of bases in their DNA, maybe the bases have 
something to do with difference in the organisms. Years 
would pass before the significance of these observations 
would be fully understood. 
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The differing amounts of the bases in various organisms, 
coupled with the fact that of A and T were always present 
in equivalent amounts, as were G and C, were important 
clues to the function and structure of DNA. 


Watson and Crick Set Their Sights on Solving 
the DNA Puzzle 


Students of the twenty-first century are often taught 
about the structure of DNA as if it has always been 
known. They learn about DNA’s components, and they 
study its double-stranded spiral form known as the dou- 
ble helix. But in reality, in the early 1950s, biochem- 
ists did not know about either the number of strands 
or the helical arrangement of DNA, nor did they have 
a clear understanding of DNA functions. Many scien- 
tists thought that DNA functioned only as a structural 
support for the chromosomes apparent in the visible 
microscope. Although the components of DNA and 
their relative amounts were known, the spatial arrange- 
ment of the components remained a mystery. Inspired 
by the need to know, scientists began an unofficial race 
to determine the molecular structure of DNA. 

In early 1950, against this historical backdrop 
and in this political environment, a young American 
graduate student named James D. Watson arrived at 
Cambridge University in London to work with promi- 
nent biochemist Francis H. C. Crick. In pursuit of their 
goal to solve the three-dimensional structure of the 
DNA molecule, Watson and Crick would do no labo- 
ratory bench work—surprisingly, their greatest contri- 
bution to science was the result of the amazing ability 
to imagine DNA structures in three dimensions. In 
addition to scrutinizing the research results of many 
other scientists, Watson and Crick tested homemade, 
two-dimensional puzzle pieces (paper cutouts of sug- 
ars, bases, phosphates) in various combinations to fig- 
ure out which pieces fit together. Eventually they were 


TABLE 2.1 The base compositions of DNA from various species, as determined by 


Erwin Chargaff 


Species A T G C 

Homo sapiens 31.0 31.5 19:4 18.4 
Drosophila melanogaster 27.3 27.6 22.5 22.5 
Zea mays 25.6 25.3 24.5 24.6 
Neurospora crassa 23.0 23,3: 27.1 26.6 
Escherichia coli 24.6 24.3 25.5 25.6 
Bacillus subtilis 28.4 29.0 21.0 21.6 


Note that the percents of adenine and thymine are consistently similar, as are the percents of cytosine and 


guanine. 
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able to propose a three-dimensional model represent- 
ing the DNA structure. 

In the early 1950s, scientists used a technique 
called x-ray diffraction (Figure 2.5) to determine the 
molecular structures of various compounds including 
proteins. In this process, crystallized molecules are 
rotated and bombarded with x-rays. The atoms in the 
crystal deflect (or “diffract”) the x-ray beams, which 
hit a photographic plate and make a pattern. The dif- 
fraction pattern gives a large number of clues to the 
three-dimensional position of the atoms in the crys- 
tal. It requires sophisticated mathematics to interpret 
the diffraction pattern and arrive at a molecular struc- 
ture. At the simplest level, the diffracted x-rays can be 
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FIGURE 2.5 How crystallography and x-ray diffraction reveal 
structure. In x-ray diffraction, the x-ray beam is focused on a crystal 
painstakingly made from a pure solution of the molecule(s) of inter- 
est. The uniform structures inside the crystal diffract the x-rays onto 
photographic film, creating patterns of spots that are characteristic of 
certain molecular structures. 


Box 2.1 
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compared to ripples in a lake created by tossing a rock 
into the water. (The ripples give an idea of the size and 
shape of the rock.) It is important to remember that 
even today solving the three-dimensional molecu- 
lar structure of biomolecules by x-ray diffraction is a 
complex task that requires computers and specialized 
software. Imagine how difficult it was to use this tech- 
nology over 50 years ago. 


X-ray diffraction studies are key experiments in construct- 
ing models of large molecules such as proteins and DNA. 


Franklin’s X-ray Diffraction “Images” 
Played a Crucial Role in Visualizing DNA 
Structure 


By now, students routinely explore 3D models of pro- 
teins and DNA rendered with amazing accuracy and in 
a user-friendly format by computers. But in their time, 
Watson and Crick were engaged in a much different 
type of model building to figure out the structure of 
DNA (Figure 2.7). They had gathered as much informa- 
tion as they could from available experimental results 
and proceeded to try to integrate the information into 
an accurate theoretical model of DNA structure. They 
worked with molecular components represented by 
cardboard cutouts, sticks, and wire, cut to size accord- 
ing to experimentally derived dimensions. Watson and 
Crick tried to fit everything together, just like solving a 
three-dimensional jigsaw puzzle. 


Linus Pauling, Expert Protein Scientist, Enters the DNA Race 


Linus Pauling was already a world-renowned scientist at the 
California Institute of Technology when he entered the race to 
solve the structure of DNA. Pauling was expert at protein crys- 
tallography and x-ray diffraction, and he solved the structures 
of several large proteins in a series of cutting-edge papers 
published in 1951. He brilliantly characterized the now well- 
known alpha helical protein structure (Figure 2.6A). Pauling 
was also experienced with DNA and had proposed a DNA 
replication mechanism (with Max Delbruck). He was familiar 
with Oswald Avery’s extraordinary work in 1944 (Chapter 1), 
indicating that nucleic acids could transmit genetic informa- 
tion, and again ahead of his time, Pauling proposed in 1948 
that genes consist of mutually complementary molecules. 
However, like many scientists, Pauling still believed that the 
key to inheritance would be found in the amazing variety of 
protein structures and not in the apparently repetitive DNA 
polymer. Pauling actually spoke in person with Erwin Chargaff 
in 1947. Chargaff told him about his observations on DNA 


bases, but for some reason Pauling, who found Chargaff’s 
personality disagreeable, did not heed this important clue to 
DNA structure. Pauling and his colleagues continued to build 
DNA models with three DNA strands wrapped around the 
axis of the DNA helix, with the bases exposed on the outside 
of the structure (Figure 2.6B). 

How did a double Nobel Laureate scientist like Linus 
Pauling make such an error? Pauling was at a deficit because 
he did not have access to the beautiful x-ray diffraction pat- 
terns of DNA generated by Rosalind Franklin and Maurice 
Wilkins at King’s College in London. Pauling wrote to Wilkins 
and asked to see Franklin’s x-ray pictures of DNA. When 
Pauling was denied, he wrote to Wilkins’ superior and was 
again denied. After the end of World War II, Pauling was active 
in a group of concerned scholars that warned the public about 
nuclear war, a touchy subject at the time. Pauling was awarded 
the Presidential Medal for Merit by President Truman and 
given a certificate of appreciation from the War Department. 
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Box 2.1 (Continued) 


(B) 


FIGURE 2.6 Linus Pauling with proposed models of protein and DNA. (A) Linus Pauling used x-ray crystallography to determine the 
structures of many large proteins in a series of groundbreaking papers published in 1951, describing the structure of the alpha helix 
protein motif (OSU Special Connections: Linus Pauling). (B) Pauling and Corey’s incorrect, triple-helical DNA structure, one of the most 


famous mistakes in twentieth-century science. 


But many viewed his opposition to atomic bombs and atmos- 
pheric nuclear testing as unpatriotic and even subversive. In 
1952, Pauling’s application for a passport to travel to England 
was denied because it “would not be in the best interests of 
the United States.” Linus Pauling stayed home and missed a 
symposium held in honor of his achievements solving protein 
structures. He also missed a chance to see Franklin’s x-ray dif- 
fraction pictures of DNA. Eventually Pauling’s passport applica- 
tion was approved, much too late for the symposium, but he 


At the time Watson arrived in England, Rosalind 
Franklin (Figure 2.8) and Maurice Wilkins (Figure 2.9) 
were also working on the problem of DNA structure 
at King’s College in London. Franklin was an expert at 
x-ray diffraction, but she was new to working with 
DNA. Wilkins was new at x-ray diffraction, but he had 
extensive experience with DNA. Unfortunately, Wilkins 
and Franklin became scientific adversaries. Rosalind 
Franklin used her extensive knowledge and experimen- 
tal skills to capture the best diffraction patterns of DNA 
ever made. The infamous pattern called “Photograph 
51” (Figure 2.8B) clearly revealed (to those experienced 
in interpretation of diffraction patterns) not only that 
the DNA molecule was a double helix containing two 
DNA strands but also the previously unknown dimen- 
sions of the DNA molecule: the diameter, the dis- 
tance per turn of the helix, and the interval between 
repeating units of the helix (Figure 2.10). In his book 
The Double Helix (W.W. Norton & Co., 1980, p. 98) 


finally visited England in the summer of 1952. For some rea- 
son, Pauling did not visit King’s College or ask to see Franklin’s 
DNA data. 

In 1953 Linus Pauling and his collaborator, Robert Corey, 
published a paper called “A Proposed Structure for the 
Nucleic Acids” in The Proceedings of the National Academy 
of Sciences, which argued for a triple-helix DNA structure. This 
paper would turn out to be one of the most famous incorrect 
theories in twentieth-century science. 


written many years later, Watson described his view of 
the photograph as follows: 


The instant | saw the picture my mouth fell open and my 
pulse began to race.... the black cross of reflections which 
dominated the picture could arise only from a helical struc- 
ture... mere inspection of the X-ray picture gave several of 
the vital helical parameters. 


These important dimensions turned out to be essen- 
tial for building the correct DNA model structure. 
Franklin’s data were scheduled to be published in 
1953. In 1952, Watson met with Wilkins and during 
discussions was shown some of Franklin’s data, includ- 
ing Photograph 51, without her knowledge. When 
Watson and Crick returned to model building, they 
eventually went on to construct their final (and correct) 
DNA model (Figure 2.7). 

Photograph 51 contained critical data, includ- 
ing that the diameter of the DNA molecule was 2.0 
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FIGURE 2.7 Watson and Crick with their model of the DNA double-helix in 1953 and 50 years later. (A) Watson (left) and Crick (right) fig- 
ured out the structure of DNA by putting together all the pieces of a three dimensional puzzle. (B) The discoverers of DNA structure pose for 


a recreation of the original picture, this time with a DNA model in 2003. 


(A) 
FIGURE 2.8 Rosalind Franklin and her most famous x-ray diffraction image of DNA. (A) Franklin’s work was essential to Watson and Crick 
in their discovery of the structure of DNA. (B) This x-ray diffraction pattern of DNA, made by Rosalind Franklin and called “Photograph 51”, 
was the source of some essential information about the structure of DNA. 


nanometers (nm, a billionth of a meter). At this time, 
many models, including the early Watson and Crick 
models and the triple helix published by Linus Pauling 
(see Box 2.1 and Figure 2.6B), had the backbone of the 
molecule located in the center, with the bases radiating 
outward. Franklin strongly disagreed with this on the 
basis of her x-ray diffraction data, insisting that the 
backbone must be on the outside of the molecule. At 
this point, Watson and Crick realized that if the bases 
were arranged to point inward (as Franklin had main- 
tained), the width of DNA would start to approach the 
2.0-nm diameter observed in Photograph 51. 

Franklin’s diffraction measurements also showed 
a 0.34-nm distance between successive nucleotides 
on the strand, with 3.4nm per turn of the DNA helix. 


(B) 


Rosalind Franklin’s x-ray diffraction studies revealed critical 
details about DNA structure that Watson and Crick were 
able to put together in constructing their model of DNA. 


Noting that 3.4 is exactly ten times 0.34, the model 
was built with ten nucleotides per turn of the helix 
(Figure 2.10). The data suggested that a DNA molecule 
is composed of two nucleotide chains wound like a 
spiral staircase around a hypothetical central axis. The 
deoxyribose-phosphate combinations form the back- 
bones of both chains, and the nitrogenous bases point 
inward. Watson and Crick’s DNA model was getting 
closer, but it was not yet right. How were the bases ori- 
ented in the center so they would fit together perfectly? 


24 


= 


FIGURE 2.9 Maurice Wilkins. Wilkins shared the Nobel Prize with 
Watson and Crick for solving the structure of DNA. 
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FIGURE 2.10 DNA helix dimensions derived from x-ray diffrac- 
tion. Franklin’s x-ray diffraction data revealed a 0.34-nm distance 
between successive nucleotides on the helix (between adjacent base 
pairs), 3.4nm per turn of the helix, and a 2.0-nm diameter. 
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Watson and Crick Integrate Available Data 
into a DNA Model 


Watson tested his cardboard cutouts to try to approxi- 
mate the 2.0-nm diameter of DNA with different com- 
binations of base pairs. He finally had a model with 
each base lined up to pair with itself, A-A, C-C, and so 
on. Jerry Donohue, a former student of Linus Pauling, 
pointed out to Watson that his model had the bases in 
their enol forms, as suggested by the textbooks at the 
time. But Donohue knew of unpublished research sug- 
gesting that the DNA bases exist in the keto form in 
cells. Once Watson changed the structures of the bases 
in the model to their keto forms, he immediately real- 
ized that with these changes the adenine and thymine 
bases formed an A-T base pair that fit the 2-nm diam- 
eter of DNA. The guanine and cytosine (G-C) base pair 
did the same. Watson and Crick knew that the proposed 
A-T base pair would satisfy Chargaff’s requirement that 
in any DNA molecule, the amounts of adenine and 
thymine are identical. Watson and Crick could envi- 
sion that for every adenine base on one DNA strand 
there must be a thymine base on the other DNA strand. 
Similarly, because DNA has equal amounts of guanine 
and cytosine, there must be a guanine base in the DNA 
for every cytosine base. As a result, Watson and Crick’s 
final DNA model contains pairs of bases, A-T and G- 
C, which each fit perfectly into the internal diameter of 
the DNA helix. When the DNA model was complete, 
Watson and Crick had solved the universal structure of 
the DNA molecule: a double-stranded DNA helix con- 
taining intertwined DNA strands (Figure 2.11). 


Watson and Crick certainly won the race to solve the mys- 
tery of DNA structure, but their accomplishment incorpo- 
rated data from Franklin, Wilkins, Pauling, and many other 
scientists. This story emphasizes how interactions among 
scientists and the free exchange of information play criti- 
cally important roles in promoting scientific breakthroughs. 


The two strands of DNA form a helix with two une- 
qually sized grooves, a narrow one called the minor 
groove and a wide one called the major groove. The 
two grooves wind continuously around the entire 
length of the DNA double helix (Figure 2.12). Many 
different proteins in the cell bind to the major groove 
in the DNA helix; which proteins bind to which 
DNA helix often depends on the specific DNA base 
sequence in the major groove. This type of DNA-pro- 
tein binding regulates when and how genes are turned 
on and off (expressed) in the cell. 

Given the base sequence—that is, the order of the 
bases—along one DNA strand, the base sequence 
along the complementary strand is automatically 
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FIGURE 2.11 Double-stranded DNA. The atoms of this space- 
filled model of double-stranded DNA are colored by element. The 
two DNA strands of a double helix are always antiparallel; they are 
arranged in opposite directions. Both ends of each strand are labeled 
to indicate the directionality of the strands. 
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FIGURE 2.12 Three representations of a double-stranded DNA 
helix. (A) “Stick” models of DNA show the bonds that underlie the 
structure of DNA. (B) Spacefill DNA models show each atom and 
the approximate space it occupies. (C) A surface representation 
shows off the dimensions of the grooves along the DNA. The DNA 
strands coil around each other in a manner that creates two differ- 
ent-sized grooves along the molecule: a wide major groove, and a 
narrow minor groove, as shown. 
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determined using simple base pairing rules; adenine 
base pairs with thymine (A-T) and cytosine base pairs 
with guanine (C-G). The DNA helix structure has an 
intrinsic duplication mechanism: the base sequence 
in one strand, the template, guides the synthesis of the 
bases in a second, complementary strand. We will see 
how this important process works in cells later in the 
chapter. 

Watson and Crick’s paper announcing the structure 
of DNA opened with the lines “We wish to suggest a 
structure for the salt of deoxyribose nucleic acid...” 
(Figure 2.13) and was greeted enthusiastically by the 
scientific community; the proposed DNA model was 
elegant in its simplicity as illustrated by the DNA helix 
sketch drawn by Crick’s wife Odile and included in 
the famous paper (Figure 2.14). The proposed structure 
of DNA made it easy to see how DNA could provide 
hereditary information. Biochemists saw that the nitrog- 
enous bases, occurring in highly variable sequences, 
could provide a code of heredity. The sequence was 
not boring and repeated (TGAC...TGAC...TGAC) as 
Levene’s work had suggested. Rather, the DNA base 
sequence was variable (TGGACTT GCCTAAGCGATA....), 
with the ability to encode a specific sequence of amino 
acids in a protein chain. 


The model proposed by Watson and Crick included the 
basic elements needed for the hereditary material: variation 
in base sequence and base pairing that suggested a mecha- 
nism for duplication, and hence inheritance, of DNA. 


Data from Many Laboratories Contributed to 
Solving the DNA Structure Puzzle 


Watson and Crick were not experts in any of the sci- 
entific areas they used in constructing their DNA 
model. Franklin on the other hand was a superb crys- 
tallographer, Chargaff understood base relationships 
thoroughly, and numerous other scientists provided 
information on the chemistry of DNA. Watson and 
Crick achieved their goal because they were able to 
see the big picture and took risks in building models 
that they could not yet establish as correct but that fit 
all the available data. They took what they needed from 
several disciplines and used it to compose something 
greater than its parts. In effect, they saw the proverbial 
“forest for the trees.” 

More than 50 years later, controversy continues 
to swirl around the relationships between Watson, 
Crick, Franklin, and Wilkins. It is clear that Franklin 
and Wilkins disliked each other and that Franklin was 
excluded from some scientific exchanges with Wilkins. 
Questions remain about who influenced whom, how 
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is a residue on each chain every 34 A. in the s-direc- 
tion. We have assumed an angle of 36° between 
adjacent residues in the same chain, so that the 
structure repeats after 10 residues on cach chain, that 
is, after 34 A. The distance of a phosphorus atom 
from the fibre ax: 10 A. As the phosphates are on 
pations have easy access to them. 

The structure is an open one, and its water content 
is rather high. At lower water contents we would 
expet the bases to tilt so that the structure could 
become more compact. 

The novel feature of the structure is the manner 
in which the two chains are held together by the 
purine and pyrimidine bases. The planes of the bases 
are perpendicular to the fibre axis. They are joined 
together in pairs, a single base from one chain being 
hydmgen-bonded to a single base from the other 
chain, so that the two lie side by side with identical 
-ordinates. One of the pair must be a purine and 
other a pyrimidine for bonding to occur. The 
hydrogen bonds are made as follows : purine position 
1 to pyrimidine position l; purine position 6 to 
pyrimidine position 6. 

If it is assumed that the bases only occur in the 
structure in the most plausible tautomeric forms | 
(that is, with the keto rather than the enol con 
figumtions) it is found that only specific pairs of 
bases can bond together, ‘These pairs aro: adenine 
(purne) with thymine yrimidine), and guanine 
(purine) with cytosine (pyrimidine). 

In other words, n adenine forms one member of i 
a pair, on either chain, then on these assumptions 
the other member must be thymine; similarly for 
guarine and cytosine. The sequence of bases on a 
single chain does not appear to be restricted in any 
way. However, if only specific pairs of bases can be 
formed, it follows that if the sequence of bases on 
one chain is giv then the sequence on the other 
chain is automatically determined. 

It has been found experimentally? that the ratio 
of the amounts of adenine to thymine, and the ratio 
of gaanine ytosine, are always very close to unity i 
for deoxyribose muc acid, 

It is probably impossible to build this structure 
with a ribose sugar in place of the deoxyribose, as 
the extra oxygen atom would make too close a van 
der Waals contact. 

The previously published X-ray data’ on deoxy- 
ribose nucleic acid are insufficient for a rigorous teat 
of oar structure, So far as we can tell, it is roughly 
compatible with the experimental data, but it must 
be regarded as unproved until it has been checked 
against more exact results, Some of these are given 
in the following communications. We were not aware 
of the details of the results presented there when we 
devised our structure, which rests mainly though not 
entirely on published experimental data and stereo- 
chemical arguments. 

Tt has not escaped our notice that the specific 
pairing we have postulated immediately suggosis a 
possible copying mechanism for the genetic material. 

Full details of the structure, including the con- 
ditions assumed in building it, together with a set 
of co-ordinates for the atoms, will be published 
elsewhere, 

We are much indebted to Dr. Jerry Donohue for 
constant advice and criticiam, especially on inter- 
atomic distances. We have also been stimulated by 
a knowledge of the general nature of the unpublished 
experimental results and ideas of Dr. M. H. F. 
Wilkins, Dr. R. E. Franklin and their co-workers at 
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lilustration reprinted with permission from Nature 171 (737-38). Copyright 1953, 
Macmillan Magazines Ltd., and with the permissions of James Watson and Francis Crick. 


FIGURE 2.13 The publication of the DNA double-helix structure. This classic, one-page Nature article written by Watson and Crick accu- 
rately described the three-dimensional molecular structure of DNA for the first time. The paper included a single figure, an original sketch of 
a novel DNA double-helix structure, rendered by artist Mrs. Francis Crick. 


Franklin’s data came to be shared with Watson, and 
whether adequate credit was given to her. Although the 
questions will probably never be resolved, the contro- 
versy gives us a glimpse of how these famous scientists 


went about uncovering the truths waiting to be discov- 
ered in nature. Readers who wish to learn more about 
the process and controversy can consult the resources 
listed at the end of this chapter. 
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This figure is purely 
diagrammatic. The two 
ribbons symbolize the 
two phosphate—sugar 
chains, and the hori- 
zontal rods the pairs of 
bases holding the chains 
together. The vertical 
line marks the fibre axis 


FIGURE 2.14 Dr. Francis H. C. Crick and wife Odile. In 2003, 
Odile and Dr. Francis H. C. Crick attended a dinner in La Jolla, 
California, honoring the 50th anniversary of the discovery of the 
structure of DNA by Dr. Crick and James D. Watson. Mrs. Crick’s 
original sketch of a DNA double-helix structure appeared in the 
famous paper in the journal Nature in 1953 (right). 


The contributions to science made by Watson, 
Crick, Franklin, and Wilkins were original and carried 
the impact sufficient to merit a place in the annals of 
scientific fame; the DNA double helix became the char- 
ter molecule of molecular biology. In 1962, Watson, 
Crick, and Wilkins were awarded the Nobel Prize in 
physiology or medicine. Unfortunately, Franklin died 
of cancer in 1958 and because the Nobel committee 
does not cite individuals posthumously, Franklin did 
not share in the award. However, Franklin’s contribu- 
tions were mentioned in the Nature paper and are now 
universally acknowledged. Recently the Royal Society 
(a distinguished academy of the sciences in the United 
Kingdom) established an annual award in Franklin’s 
name, and the University of Health Sciences in the 
United States changed its name to Rosalind Franklin 
University in 2004. 


In 1953, a single-page paper in Nature proposed a sim- 
ple, elegant double-helix model to explain the molecular 
structure of DNA. Nine years later, the Nobel Prize was 
awarded to Watson, Crick, and Wilkins. 


After the Nature article was published in 1953, 
evidence favoring the double-helix structure of DNA 
proposed by Watson and Crick accumulated rapidly. 
Few biologists doubted the accuracy of the molecular 
model, and most were intrigued with the implications of 
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the double helix for coding genetic information within 
the DNA structure. Scientists also recognized that the 
double-helix architecture of paired bases on two strands 
would accommodate the biochemical requirements for 
DNA replication. Still no one yet understood how the 
DNA double helix in a chromosome could pass genetic 
information on to the next generation. It was quite 
apparent that solving the molecular structure of DNA 
had not answered all the important questions about 
DNA, but in fact solving this scientific puzzle was the 
amazing beginning of the science of molecular biology. 
With time, this information gave rise to the fruits we 
know of as DNA science and biotechnology. 


DNA REPLICATION 


The Structure of DNA Leaves an Open 
Question 


The paired bases (A-T and G-C) in the DNA double 
helix accommodate the biochemical requirements 
for reproducing or duplicating the DNA structure and 
information. Indeed this critically important point was 
not lost on Watson and Crick, who wrote one of the 
greatest understatements of all time in their DNA helix 


paper: 


It has not escaped our notice that the specific base pairing 
we have postulated immediately suggests a possible copying 
mechanism for the genetic material. 


This is one of the most appealing features of the 
Watson—Crick DNA model; the structure of the DNA 
double helix provides insight into how the DNA mol- 
ecule duplicates. In cells, this process is called DNA 
replication. The copying and distribution of DNA is 
a key process in cells with major implications for the 
mechanisms that operate to transmit genetic informa- 
tion from one generation to the next. From the DNA 
structure it is easy to see that one strand of the dou- 
ble helix has a base sequence that unambiguously 
determines the base sequence of the opposite com- 
plementary strand (Figure 2.15). Any sequence of A, 
G, C, or T bases can reside on one DNA strand, but 
the second strand must contain the complementary 
base sequence. For example, if the base sequence in 
one strand is G-T-A-C-C-A-T..., the base sequence of 
the partner strand must be C-A-T-G-G-T-A.... For the 
DNA replication process to occur, the DNA double 
helix must unwind and the DNA strands must “unzip” 
and separate. This allows both single strands of DNA 
to become accessible to the enzymes that use each 
strand as a template to create a new, complementary 
DNA strand (Figure 2.15). 
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FIGURE 2.15 Overview of DNA replication. (A) The two DNA strands (red backbones) in the parent DNA double helix are both used as 


templates by DNA polymerase and are copied into new DNA strands (gold backbones). (B) Any sequence of A, G, C, or T bases can reside on 
the DNA template strand (red), but after replication, the new DNA strand (b/ue) must have the complementary base sequence. 
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FIGURE 2.16 Originally proposed mechanisms of DNA replication. The proposed DNA double-helix structure did not immediately explain the 
complete mechanism for how the DNA helix replicates to produce two DNA helix molecules. Three possible mechanisms of DNA replication were 
proposed: (1) each parent strand combines with the complementary new strand to reform a new double helix (semiconservative), (2) the parent 
strands reunite with each other, leaving the newly synthesized strands to form the second double helix (conservative), or (3) the products of DNA 
replication contain alternating regions of conservative replication (dispersive). Meselson and Stahl designed an elegant experiment to determine the 


correct answer. 


Meselson and Stahl Answer the Question 


The DNA double-helix model left another important ques- 
tion unanswered, because the structure of DNA by itself 
did not automatically predict a specific mechanism used 
by the cell to copy (replicate) the DNA helix. Scientists 
suggested three possible ways that DNA might be repli- 
cated in the cell, called the dispersive, conservative, and 
semiconservative modes of DNA replication (Figure 2.16). 


The structure of DNA suggested base pairing as a gen- 
eral mechanism for DNA replication. However, the proc- 
ess by which each strand of the double helix was copied 
remained to be discovered. 


The definitive experiment to address this ques- 
tion was published in 1957 by Matthew Meselson 
and Franklin W. Stahl of the California Institute of 
Technology, who figured out a way to experimentally 
distinguish among the three possible modes of DNA 
replication. The Meselson and Stahl DNA replication 
experiment is one of the best demonstrations of how a 
key scientific theory or hypothesis can be conclusively 
resolved by a simple, well-designed, and definitive 
experiment (Figure 2.17). 

To start this experiment, Meselson and Stahl grew 
bacteria in liquid medium containing a “heavy iso- 
tope” of nitrogen called ;;N. At each round of bacte- 
rial cell division, as the DNA was replicated, the heavy 
15N isotope became incorporated into the bacterial 
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Overview of the Meselson and Stahl experiment. (A) Bacteria are grown in ;;N-containing medium until virtually all of the 


nitrogen in DNA is 45N. Then they are grown in ;4N-containing medium. Samples of DNA are isolated at each stage. (B) The DNA samples 
are centrifuged in a salt density gradient. During centrifugation, the DNA will move through the salt gradient until the density of the DNA 
and salt solution are equal. The more ;5N is in the DNA, the denser it is, and the further it will move down the gradient. 


chromosome DNA, instead of the normal isotope, 44N. 
As a result, the DNA in this population of bacteria 
was heavier than normal DNA because the heavy 45N 
atoms have replaced the lighter 44N atoms normally 
found in the nitrogenous bases. These two types of 
DNA, “heavy” (H) and “light” (L), have different densi- 
ties and can be physically separated by density gradient 
centrifugation methods (Figure 2.17). 

First, Meselson and Stahl grew the bacteria in 
medium containing the heavy isotope long enough that 
virtually all DNA helices in the bacteria would contain 
heavy isotopes in both DNA strands (H/H). This was the 
starting point of the experiment (Figure 2.18A and B). 
After removing aliquots (samples) at this stage of the 
experiment, the bacteria growing in ;;N medium were 
then transferred to fresh growth medium containing 
only the light 44N isotope and were allowed to undergo 
a single round of bacterial cell division (during which 
the DNA in the cells replicates once and only once). 
Aliquots of the growing bacteria were removed for 
analysis. Then, following more rounds of cell divi- 
sion, additional aliquots were removed. The DNA was 
isolated from cells in the aliquots for density gradient 
analysis. 

When they analyzed the aliquots (Figure 2.18B), 
Meselson and Stahl confirmed that the DNA first 
synthesized by the growing bacteria was heavy (H/ 
H) because it contained only the heavy isotope 45N. 
The second set of samples, from when the bacte- 
ria reproduced once taken in the light isotope (,4N 
growth medium) a different form of DNA was found 


with a lighter density. Most interesting, the scientists 
noted that the newly made DNA was not as light as the 
DNA containing only 44N (L/L). Rather, the new DNA 
species had an intermediate density falling in between 
the all-heavy DNA (H/H) (containing only ;;N) and the 
all-light DNA (L/L) (Figure 2.18). This eliminated the 
possibility that the new DNA strands form an entirely 
new DNA molecule (L/L) and the parent DNA strands 
forming a second molecule (H/H). 

After another round of bacterial reproduction takes 
place, in addition to some intermediate-density (L/H) 
DNA, the density analysis revealed that the bacteria 
now formed light-density DNA (L/L) containing only 
14N (Figure 2.18B). The light-density (L/L) DNA rep- 
resented the molecule formed from the 44N (L) par- 
ent strand of the L/H DNA and a newly synthesized 
DNA strand using the 44N in the culture medium. This 
result indicated decisively that a replicated double 
helix contains a parent DNA strand and a new DNA 
strand. Neither the conservative nor the dispersive 
mechanism (Figure 2.16) could explain these results, 
whereas the semiconservative mechanism fit the data 
perfectly. 

The DNA replication mechanism suggested by the 
Watson—Crick DNA helix model and confirmed by 
Meselson and Stahl was termed semiconservative rep- 
lication because within each new DNA double helix, 
one of the two DNA strands (“semi”) is from the origi- 
nal double helix (“conserved”). Semiconservative DNA 
replication occurs in all cells just before they undergo 
cell division (mitosis). Each “parent” strand of DNA acts 
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FIGURE 2.18 Interpreting the results of the Meselson and Stahl experiment. (A) Initial comparison in density gradient centrifugation of DNA 


from normal and experimental growth conditions. Bacteria were grown in medium with the normal “light” isotope of nitrogen (14N), which 
was incorporated into the bacterial chromosome DNA as the cells divided and the DNA replicated. Their DNA was compared with that of 
bacteria grown in “heavy,” ;;N-containing medium. Density gradient centrifugation was used to physically separate and identify the different 
DNA species made in the cells. 45N DNA (H/H) is denser than normal DNA because the heavy ;;N atoms replace the lighter 44N atoms in 
both strands of DNA. As a result, it migrates farther during centrifugation. (Lighter indicates lower density, heavier indicates higher density.) 
(B) The H/H bacteria were grown for one generation at a time with 44N, the light isotope of nitrogen. DNA was extracted from sample aliq- 
uots after each generation and centrifuged for comparison of density. With each generation, the density of the DNA decreased with succes- 
sive generations. The percentage beneath each generation indicates the relative amount of DNA in each band. 


as a template for the synthesis of a new DNA strand. 
In effect, the parent strand dictates that a nucleotide 
containing adenine (A) must be placed opposite one 
containing thymine (T) (and vice versa); it dictates that 


a nucleotide containing guanine (G) will be positioned 
opposite one containing cytosine (C) (and vice versa). 
These “base-pairing rules” are the key to the secret of 
how the DNA helix passes on genetic information. 
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Meselson and Stahl designed and executed a clever and 
efficient experiment to discriminate between three possi- 
ble means of DNA replication. Their results indicated that 
each strand of the parent DNA helix remains intact and 
becomes paired with an entirely new strand, in a process 
called semiconservative replication. 


FIGURE 2.19 Father and son Nobel laureates. Arthur Kornberg (left) 
with his son Roger (right) in 2006. Jamie Kripke Photography. 


DNA Polymerases Copy Complementary 
DNA Strands from DNA Templates 


When DNA replication occurs, a DNA polymerase 
enzyme reads the template DNA base sequence and 
adds the complementary nucleotide bases to the new, 
growing DNA strand one by one. In the 1950s, Arthur 
Kornberg (Figure 2.19) discovered a DNA polymer- 
ase enzyme in £. coli bacteria and showed that DNA 
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polymerase could synthesize DNA outside a cell 
using building block components provided in a test 
tube. This was an awesome accomplishment. In the 
presence of Roger, his then 12-year-old son, Arthur 
Kornberg received the 1959 Nobel Prize in physiology 
or medicine for his groundbreaking work on DNA syn- 
thesis. Kornberg’s research characterized replication, 
the process of DNA copied into duplicate DNA strands 
when cells divide. This universal process in all cells is 
essential to understanding how genetic information is 
transferred from parent cells to progeny cells. Arthur 
returned to Stockholm again in 2006, but this time the 
Nobel Prize in chemistry went to his son, for Roger’s 
cutting-edge work showing how the genetic informa- 
tion in DNA is copied into a messenger RNA (mRNA) 
strand using the process of transcription (see Chapter 
3). The cell absolutely requires transcription for life. 
If transcription stops, the organism dies quickly. This 
is the cause of death from eating certain mushrooms, 
which contain a lethal toxin that quickly blocks tran- 
scription in the cells. 

As it turned out, Arthur Kornberg’s DNA polymer- 
ase is not the principal enzyme used to replicate the 
genome DNA in the cell. We now know Kornberg’s 
enzyme as EF. coli DNA polymerase I, one of three 
DNA polymerases that replicate chromosome DNA 
in E. coli cells (Table 2.2). All organisms contain DNA 
polymerases, which range from small monomer pro- 
teins to enormous multiprotein complexes. Smaller 
DNA polymerase enzymes are often involved in repair- 
ing damaged DNA (see Chapter 9), whereas larger 
multiprotein polymerase complexes participate in the 
replication of genome DNA. Although some aspects of 
DNA replication vary among prokaryotic and eukaryo- 
tic cells and viruses, the chemical mechanism and the 
protein machinery involved in DNA replication are 
highly conserved. 


TABLE 2.2 Summary of properties of three of the five DNA polymerase enzymes in E. coli 


E. coli DNA polymerases 


l 


E. coli gene for the polymerase subunit polA polB polC 

Number of subunits 1 >4 >10 

Proofreading (3' to 5’) exonuclease activity yes yes yes 

5’ to 3’ exonuclease activity yes no no 

Polymerization rate (nucleotides added per second) 16-20 5-10 250-1000 
Processivity (nucleotdes added before dissociation) low (3-200) high (10,000) very high (500,000) 


Source of table: http://www.mun.ca/biochem/courses/3107/Topics/DNA_polymerases.html. 
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The Replication Fork Allows Both DNA hydroxyl group (-OH) group at its 3’ end providing a 
Strands to be Synthesized in the 5’ to 3’ site for the DNA polymerase to add the next nucleotide 
Direction (Figure 2.20B). The RNA primer is removed later by an 
additional enzyme, and replaced with DNA bases. 
The relatively simple idea of replicating the DNA helix As a consequence of the requirement for a primer 
by separating the DNA strands and copying the tem- with a 3’ -OH group, DNA polymerase enzymes can 
plate to make complementary DNA strands is only a copy a DNA template in only one direction, adding 
starting point for understanding the process of DNA new nucleotide units on to the 3’ end, and synthesizing 
replication. Along with matching the incoming nucleo- new DNA strands in the 5’ to 3’ direction only. If the 
tides to the template strand, the polymerase forges a polymerase worked on the 5’ end of the DNA instead, 
bond between the existing strand and the new nucle- the high-energy bond would be on the growing strand, 
otide. The energy for this reaction comes from the and not on the incoming nucleotide. That would not 
removal of two phosphates from the incoming nucle- allow for correction of mistakes, because if an incorrect 
otide (Figure 2.20). In fact, DNA polymerases can only nucleotide were removed, there would no longer be 
add nucleotides to the 3’ end of a DNA strand. a high-energy bond available for adding a nucleotide 
Some details of the replication process are key in (the triphosphate moiety is at the opposite end of the 
making extremely high-fidelity (almost error-free) DNA incoming nucleotide). Thus, the directionality of DNA 
copies. The overall rate of errors in eukaryotic DNA rep- strand synthesis contributes to fidelity of replication. 
lication is only about one in a billion bases. Part of the 
reason for this is that DNA polymerases are good at cor- 


recting their own errors using proofreading and repair The initiation of DNA replication requires an RNA primer 
activities. DNA polymerases can add onto an existing as a starting point to provide a 3’ -OH for DNA polymer- 
RNA primer, a short strand of RNA that is base paired ase to extend with additional nucleotides into a new DNA 
to the template DNA strand at the point on the helix strand. Replication proceeds only in the 5’ to 3’ direction. 


where DNA replication is to start. The primer has a 
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FIGURE 2.20 Incoming nucleotides are added to the 3’ OH during replication. (A) To start to copy DNA, the DNA polymerase enzyme 
needs a primer with an available 3’ OH. Thus the DNA strand grows in the 5’ to 3’ direction only. The two phosphates on the end of the 
incoming nucleotide are removed as pyrophosphate providing energy for the formation of a bond between the two nucleotides. (B). The 
chemical structure of the DNA backbone is shown as a C nucleotide is added onto the 3’ end of the DNA, base-pairing with a G nucleotide. 
The 3’ OH plays a role in breaking the bond between the phosphates of the incoming deoxyribonucleotide triphosphase (dNTP). 
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During DNA replication, many enzymes and pro- 
teins work together at the site of replication, called the 
replication fork, as it works its way along the length 
of the DNA molecule. A very simplified view of the 
replication fork (Figure 2.21A) highlights a problem 
posed by the fact that DNA polymerases synthesize 
DNA strands in the 5’ to 3’ direction—since the two 
strands of a DNA double-helix molecule are antiparal- 
lel to each other, only one strand can be synthesized, it 
would seem, as the fork advances. The cell solves this 
problem using two DNA polymerase enzymes, one 
enzyme working continuously in the 3’ to 5’ direction 
synthesizing the leading strand, and the other enzyme 
performing DNA synthesis in a fragmentary fashion 
along the lagging strand. 

A more complex model (Figure 2.21B) shows 
that the replication fork is the site of a complicated 
choreography. A multi-protein polymerase com- 
plex includes both polymerases, clamp proteins that 
help keep the polymerases on the DNA, and a heli- 
case enzyme that separates the two parent strands of 
DNA as the replication fork moves forward. On the 
leading strand template, the DNA polymerase com- 
plex performs continuous replication moving toward 
the advancing replication fork. However, the parent 
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DNA replication occurs at the replication fork. (A) A simplified schematic diagram shows the two parent strands of DNA 
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template for the lagging strand forms a large loop 
that feeds into the polymerase complex. On the lag- 
ging strand, a polymerase performs discontinuous 
replication and a primase enzyme repeatedly lays 
down short RNA primers to provide a 3’ -OH for 
adding nucleotides (on the leading strand, a primer 
is needed only where replication starts). The prim- 
ers are subsequently removed, and DNA polymerase 
fills in the resulting gaps. The resulting DNA strands, 
called Okazaki fragments, are joined to each other 
by a DNA ligase enzyme to complete lagging strand 
synthesis. The large loop made for the lagging DNA 
strand allows for coordination of the activities of the 
DNA polymerases at the constantly moving replica- 
tion fork, placing the two actively replicating regions 
of DNA adjacent to each other. 


The replication fork is the site of DNA synthesis on both 
strands of a DNA double helix. DNA polymerase enzymes 
at the fork add nucleotides to each growing strand of DNA 
in the 5’ to 3’ direction. One strand therefore is synthesized 
continuously; the other is synthesized as small fragments 
that are subsequently joined together. 
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opened for replication, forming the replication fork. A DNA polymerase operates on each strand, synthesizing a new DNA strand moving in 
the 5’ to 3’ direction. One polymerase creates one continuous new strand, the leading strand. On the opposite strand, a discontinuous series of 
Okazaki fragments is created on the lagging strand. The resulting Okazaki fragments are sealed into a continuous DNA strand by a DNA ligase 
enzyme. (B) A more detailed diagram of the replication fork shows the loop that forms enabling a polymerase dimer to copy both strands as 
a unit. Also shown are sliding clamp proteins that help the polymerases to stay on the DNA, a helicase enzyme that separates the two parent 
strands, and a primase enzyme that lays down the RNA primers to initiate synthesis of the Okazaki fragments on the lagging strand. 
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Box 2.2 DNA Replication Is a High-Fidelity Process That Makes Nearly Error-Free DNA Copies 


The accuracy of DNA synthesis is very important for correctly 
transmitting genetic information over many generations and 
for avoiding mutations that can initiate and promote diseases 
such as cancer. Studies on E. coli chromosome replication 
show that the error rate of the bacterial replication machinery 
in vivo (in the cell) is very low, making only 1 error in every 
10 million to 100 million bases copied into DNA. It is impres- 
sive that in a growing bacterium, the DNA replication enzyme 
complex moves like a locomotive along the DNA helix at the 
astonishing rate of 500 nucleotides (bases) per second, simul- 
taneously reading the DNA template and incorporating the 
complementary nucleotides into the growing strand. DNA 
replication in eukaryotic cells is even more accurate, in part 
because of safeguards that operate before and after DNA rep- 
lication in the cell cycle (see Chapter 9 on cell fate). Believe 
it or not, DNA polymerases not only proofread their work to 
find errors, but they also have the ability to actually correct 
the replication mistakes made in the genome DNA before the 
enzyme moves farther down the DNA template. 

The DNA polymerase proofreading activity can detect 
and repair replication mistakes quickly. The DNA polymerase 
enzyme contains a 5’ to 3’ exonuclease activity and a 3’ to 
5’ exonuclease activity, which are both involved in maintain- 
ing DNA accuracy and integrity. E. colis DNA polymerase | 
can be cleaved by certain enzymes into two smaller proteins. 
The larger of these two, called the Klenow fragment (Figure 
2.22), has been studied closely and retains both the ability 
to synthesize DNA and the 3’ to 5’ exonuclease activity. The 
three-dimensional structure of the Klenow fragment is shaped 
approximately like a right hand, which helps people to better 
visualize how a polymerase functions (Figure 2.23). 
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FIGURE 2.23 DNA polymerase has a “hand” in DNA replication 
and proofreading. The Klenow fragment DNA polymerase can 
be visualized as a partially open hand with thumb, fingers, and 
palm domains. The DNA helix is cradled in the hand of the pro- 
tein, with the palm domain making a plate at the bottom of the 
cleft formed by the thumb and fingers. The polymerase activity 
takes place in the cleft at the base of the fingers and the thumb. 
The 3’ to 5’ exonuclease site is adjacent to the palm domain, 
and contains an incorrectly added base at the 3’ end of the DNA 
strand being synthesized (purple). This is the site that removes 
an incorrectly paired nucleotide. The template DNA strand 
(beige) was truncated in order to crystallize the complex for x-ray 
diffraction. 
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FIGURE 2.22 DNA polymerase has a 3’ to 5’ proofreading activity that corrects mistakes. When DNA polymerase adds an incorrect base 
to the the 3’ end of the growing DNA strand, the 3’ to 5’ exonuclease activity removes the incorrect base and the DNA polymerase then 


continues synthesis by adding the correct base. 
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Replication Origins in DNA Control the 
Start of DNA Synthesis 


It is essential that DNA synthesis be closely regulated to 
ensure that the cell does not replicate the genome DNA 
more than once per cell cycle. Creating additional cop- 
ies of the genome could have disastrous effects on gene 
expression resulting in the death of the organism. An 
important step in controlling DNA replication (Chapter 
9) occurs when the cell “decides” to initiate DNA syn- 
thesis. Replication of the bacterial chromosome (Figure 
2.24) begins at a specific region of DNA called the ori- 
gin of replication (ori). A collection of proteins controls 
the use of the ori DNA. To initiate replication of a DNA 
helix, proteins separate the DNA strands in the ori region, 
which contains a high percentage of A-T base pairs. 
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(A-T base pairs form only two hydrogen bonds, com- 
pared to G-C pairs which form three, so it is easier to 
separate “A-T rich” DNA strands.) Once DNA replication 
has been initiated, two replication forks proceed in oppo- 
site directions away from the point of origin in the DNA. 
In humans, DNA replication is particularly impres- 
sive because of the massive size of the genome com- 
pared to bacteria. Every time a human cell divides, 
the billions of DNA bases distributed over 46 chromo- 
somes must be quickly and accurately replicated to 
produce enough chromosomes for correct cell division. 
Eukaryotic chromosomes have multiple replication ori- 
gins distributed along each DNA genome (Figure 2.25). 
The cells must control the rate at which DNA replica- 
tion initiates at each origin on each chromosome. The 
goal is to restrict duplication of the cell genome to only 


s tt. ? * 
eee z : _~z Daughter strand 
A a aN 
Bret: A Replication 
nae va -~ fork NX 
si ai 
7 z 
NAR s 
poo: | 
es f Replication 
\ tee fork 
Cag Be, 
my Parent strand 
Ld 
100 um 


(B) 


FIGURE 2.24 E. coli chromosome replication. (A) Replication of the circular bacterial chromosome starts at a specific site (ori, the origin of 
replication). Two replication forks are formed by separating the strands of the double helix, and they proceed in opposite directions around the circular 
molecule. (B) The electron micrograph on the left shows the circular E. coli chromosome during the process of replication, diagrammed at the right. 
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FIGURE 2.25 Eukaryotic genomes have multiple DNA replication origins. (A) In eukaryotes DNA replication initiates at many sites along the 
very long chromosome DNA molecule, making many small replication bubbles (1); bidirectional DNA replication increases sizes of bubbles (2), 
and eventually the bubbles resolve into two DNA helices (two daughter DNA molecules) (3). (B) Electron micrograph shows three replication 
bubbles along chromosome DNA in Chinese hamster cells. (Red arrows show the direction of movement of the replication forks in each bubble.) 
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FIGURE 2.26 Keeping perspective: How is DNA replication related 
to gene expression? Replication insures the perpetuation of genetic 
information in an organism as it grows and develops, and between 
generations of the organism. In contrast, gene expression involves 
making use of the stored genetic information, via the processes of 
transcription and translation. 


once per cell cycle. The timing and temporal order of 
initiation at the hundreds of DNA replication origins 
in the genome are regulated by specialized replication 
control proteins (see the discussion in Chapter 9 on cell 
fate [includes cell cycle] and S phase [DNA synthesis]). 


The cell must regulate the timing of genome replication 
during each cell cycle. Proteins bind to the replication 
origin(s) in the genome DNA to control the initiation of DNA 
synthesis. 


Frequently during our exploration of cell func- 
tion it is important to touch base and recognize how 
each specialized process, such as DNA replication, fits 
into the big picture of what is going on inside the cell 
(Figure 2.26). Accurate chromosome duplication and 
transmission are key processes that must be completed 
correctly and at the correct time during the cell’s 
growth cycle, or the cell will not survive. 


SUMMARY 


Scientists wanted to solve the structure of DNA because 
they knew that the DNA molecule would provide 
important clues to the function of DNA and reveal 
information about how DNA replicates every time the 
cell divides. In the 1920s, Phoebus Levene and his col- 
leagues identified the three basic components of DNA: 
a five-carbon sugar named deoxyribose (ribose in RNA); 
phosphate groups; and the four nitrogenous bases, 
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adenine (A), thymine (T), guanine (G), and cytosine 
(C). These components make up the nucleotides, the 
building blocks of a DNA strand. Studies by Chargaff 
indicated that the amounts of the DNA bases vary in 
different organisms, suggesting that organisms may be 
different because their DNAs are different. Regardless 
of the source of the DNA, however, Chargaff found that 
the amount of adenine (A) always equals the amount 
of thymine (T) and the amount of guanine (G) always 
equals the amount of cytosine (C). These and other 
experimental data made essential contributions to solv- 
ing the puzzle of DNA structure, which was published 
by Watson and Crick in 1953. Using the x-ray diffraction 
photographs of DNA obtained by Franklin and Wilkins 
and incorporating chemical data on DNA, Watson and 
Crick proposed that DNA is a double-stranded helical 
molecule, the DNA double helix, with each adenine (A) 
paired to a thymine (T) and each guanine (G) paired to 
a cytosine (C). 

The arrangement of bases paired between the two 
DNA strands of the double helix suggested a possi- 
ble replication mechanism in which each DNA strand 
carries a base sequence that intrinsically encodes 
its complementary strand. In 1957, Meselson and 
Stahl showed that DNA replication proceeds using a 
semiconservative mechanism where one of the paren- 
tal DNA strands is conserved in each of the newly 
formed DNA double helices. In each case, the paren- 
tal DNA strand serves as a template for the synthe- 
sis of a brand-new, complementary strand of DNA, 
while also becoming paired with the new strand. 
DNA polymerase is the type of enzyme that copies 
DNA into DNA when the cell divides. During DNA 
replication, the DNA polymerase enzyme reads the 
template base sequence and adds the correct comple- 
mentary nucleotide base to the growing DNA strand. 
As a consequence of the replication mechanism, DNA 
polymerase copies the DNA template in only one 
direction; each new DNA strand is synthesized in the 
5’ to 3’ direction, with nucleotide units added on to 
the 3’ end. In the cell, DNA replication starts at spe- 
cific sites on the chromosome DNA called origins of 
replication. Specialized proteins associated with the 
origin DNA play critically important roles in initiating 
DNA replication and in restricting the occurrence of 
DNA replication to once per cell cycle. 


REVIEW 


This chapter describes the development of thought 
and experiments that led up to the historic discovery 
of the double-helix structure of DNA and beyond it to 
an understanding of the process of DNA replication. 


Chapter | 2 The DNA Double Helix 


To assess your comprehension of these topic areas, 
answer the following review questions: 


1. According to Levene’s research, what are the three 
basic components of the nucleotide units in DNA? 

2. Describe the method by which nucleotide units 
are linked to one another in DNA. 

3. What two significant observations were made by 
Erwin Chargaff’s group, and how were they impor- 
tant to establishing the structure of DNA? 

4. Explain how Rosalind Franklin’s x-ray diffraction 
studies helped to determine the structure of DNA. 

5. Given the base sequence of one strand ina DNA 
helix, explain how you can deduce the base 
sequence of the opposite DNA strand. 

6. In the race to solve the structure of the DNA helix, 
what were the key features of the DNA model pro- 
posed by the famous scientist Linus Pauling? 

7. Describe the structure of DNA referred to as a 
“replication fork,” and explain how it is related to 
the process of DNA replication. 

8. Determine what the results of the Meselson and 
Stahl experiments would have been if the mecha- 
nism of replication were actually (a) conservative 
and (b) dispersive (see Figure 2.16) instead of 
semiconservative. 

9. Describe the activities of some of the proteins 
required for DNA replication in addition to DNA 
polymerase. 

10. Explain the role of replication origins in bacterial 
and eukaryotic cells. 
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Genetic Tweak Produces Mighty Mouse to Outrun Rivals 


The Guardian, November 2, 2007 

By lan Sample 

Scientists have created a real-life Speedy Gonzales by 
genetically engineering a mouse which can easily outrun 
its natural cousins. 
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When let loose on a treadmill in the laboratory, the 
mouse ran for up to six hours without stopping, covering 
many kilometres before finally taking a rest. Normal 
mice gave up after covering just 200 metres at the same 
speed. 

The souped-up rodent consumes 60% more food than 
other mice, but remains fitter and leaner. Surprisingly, 
the species also lives longer and is able to breed until a 
later age. 


With the goal of investigating exercise effects on can- 
cer, scientists generated a mouse with an extra copy 
of the PEPCK-C gene, placing it under the direction 
of a special DNA control region. This control region 
causes the mouse’s muscle cells to make high levels 
of PEPCK-C proteins. The PEPCK-C protein functions 
in an important biochemical pathway that mitochon- 
dria use to obtain energy for the cell via the citric acid 
cycle. (Mitochondria are the organelles in eukaryo- 
tic cells that produce energy for the cell’s functions.) 
PEPCK-C genes are usually most active in liver cells, 
making high levels of PEPCK-C protein, but little if any 
of this protein is normally made in muscle cells. The 
high level of PEPCK-C protein in the mouse’s mus- 
cle cells appears to have transformed these mice into 
“super mice.” Interestingly, the muscle cells from the 
super mice make many more mitochondria than nor- 
mal muscle cells, and their muscle cells produce less 
lactate, the chemical that causes a burning sensation 
when produced in muscles during exercise. 

But beware! Before you head out to look for a 
PEPCK-C protein supplement in the store, understand 
that the results in mice probably have limited applica- 
tion to human health. For one thing, the PEPCK-C gene 
may not function the same way in humans as it does in 
mice. In addition, the super mouse experiments show 
that increasing the amount of protein produced by a 
gene can have unanticipated effects. This experiment 
highlights the importance of controlling gene expres- 
sion, the subject of this chapter. 


39 


Chapter 3 } 


40 


LOOKING AHEAD 


This chapter explores the succession of observations 
and experiments that led scientists to make many criti- 
cal connections between deoxyribonucleic acid (DNA) 
and protein synthesis. We will describe the mecha- 
nisms by which the genetic information in DNA is con- 
verted into biochemical action through the encoded 
gene products, mostly proteins. On completing this 
chapter, you should be able to do the following: 


e Understand how scientists made the connection 
between chromosomes and biochemical activities 
taking place in the cell. 

e Follow the reasoning that led scientists to under- 
stand the genetic code in DNA, and helped them 
relate the code to the specific amino acid sequence 
in a protein. 

e Recognize the roles played by ribonucleic acid 
(RNA) in transmitting the genetic information in 
DNA into the amino acid sequence in a protein. 

e Explain why the genetic code Is considered to be 
degenerate. 

e Describe the role of translation in the synthesis of 
proteins in the cell. 

e Understand the need for gene control in cells. 

e Summarize the mechanisms behind some positive 
and negative gene expression controls. 


INTRODUCTION 


In the beginning of the 20th century, scientists sig- 
nificantly advanced human understanding when they 
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realized that patterns of heredity could be explained 
by the activity of chromosomes. That concept led not 
only to the modern science of genetics but also to 
great advances in medicine, biotechnology, agricul- 
ture, and many other fields. Genetics has profoundly 
influenced how we think about ourselves and our 
world because it removed the mystery from heredity 
and made the biological nature of life seem more logi- 
cal and approachable. 

The great discoveries concerning DNA and chro- 
mosomes also raised a question that would influence 
biological thinking for more than half a century: What 
exactly is the biochemical connection between chro- 
mosomes and the hereditary traits visible in organ- 
isms? Put simply, how do chromosomes (Figure 3.1) 
and genes work? 

Modern biologists can partly answer that question. 
We know, for example, that the genetic information 
in chromosomes is put into action by directing the 
production of molecules, mainly proteins that both 
form the structures and carry out biochemical actions 
in our cells, tissues, and organs. Hence, all parts of 
human anatomy reflect the activity of our chromo- 
somes. Moreover, the entire chemistry of our bodily 
activities (whether moving or thinking or digesting) is 
governed by biological catalysts called enzymes, and 
most enzymes are proteins. Thus, the structure and 
chemistry of our bodies revolve around proteins. The 
genes, carried on chromosomes, specify the proteins 
(Figure 3.2). 

The fundamental ideas that relate chromosomes 
to genes and proteins were not arrived at in a sud- 
den eureka moment but were developed slowly by a 
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Eukaryotic chromosomes. (A) A duplicated, condensed eukaryotic chromosome during mitosis. In the duplicated state, each 


chromosome of the pair is called a sister chromatid. Each sister chromatid contains a single, linear, double-stranded DNA molecule extend- 
ing from telomere to telomere through the centromere. During mitosis, the chromosome is attached to the mitotic spindle fibers via the 
centromere. The DNA helix is packaged by histone proteins (not shown) into chromosomes. (B) Electron micrograph of condensed X and 
Y chromosomes that have been extracted from a cell. (C) A map of the Y chromosome showing the long and short arms and a view of the Y 


chromosome banding pattern (Yq12, Yq11.223, Yq11.221, Yq11.31). 
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succession of scientists working over many decades. 
This process began with the understanding that deox- 
yribonucleic acid (DNA) is a key substance of chromo- 
somes (Chapter 1), and continued with the discovery 
that the DNA structure has the capacity to encode 
genetic information (Chapter 2). This chapter focuses 
on the seminal experiments that related chromosomal 
DNA and genes to the production of proteins in the 
cell. We will consider the mechanisms by which the 
biochemical information in DNA is accessed and 
then put into action in the cell, mostly in the form of 
proteins. Proteins are linear strings of amino acids, 
whose sequences are encoded in DNA and interpreted 
through RNA. 

The DNA double helix itself is designed to encode, 
duplicate, and store genetic information. The chapter 
title, “DNA in Action,” refers to the biological mecha- 
nisms that permit DNA to direct the deeds of the cell 
via the synthesis and activity of specific proteins and 
RNA molecules. By considering the information in 
DNA as it is put into action, we are really studying 
how genes work; in doing so we are answering one of 
the great questions in biology: How do our chromo- 
somes express our hereditary traits? 


Chromosome 
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Much of the information contained in our chromosomes 
is accessed and executed through the synthesis of pro- 
teins, which are the end product of most of our hereditary 
instructions. 


FUNDAMENTAL SCIENCE CONNECTS 
DNA AND TRAITS 


The First Link between Inheritance and 
Enzymes 


In 1902, a prominent British physician named 
Archibald Garrod (Figure 3.3) wrote that certain dis- 
eases seemed to occur time and again in selected fam- 
ilies. Garrod studied the disease alkaptonuria in detail. 
People with this disease expel urine that rapidly turns 
black when exposed to air. The color changes because 
the urine contains alkapton (the chemical name is 
homogentisic acid), which darkens on exposure to oxy- 
gen. In normal individuals, alkapton is broken down 
in the body, but the alkaptonuria disease prevents this 
process, and alkapton is excreted in the urine. Garrod 
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FIGURE 3.2 Chromosomes carry genes written in DNA language. A eukaryotic chromosome is shown removed from the nucleus and 
unpackaged to indicate the linear double-stranded DNA molecule that carries the gene. Eukaryotic genomes have a large amount of non- 
coding DNA. In eukaryotes, genes encoded along the DNA are transcribed into mRNA in the nucleus, and the RNA is processed before car- 
rying the genetic code for a new protein into the cytoplasm for translation. 
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FIGURE 3.3 Garrod suggested a link between inheritance and enzymes. Archibald Garrod’s observations of diseases that occurred over and 
over again in the same families led him to postulate connections between disease, inherited factors, and enzymes—all relatively new con- 


cepts in medicine at the time. 


studied several generations of different families and 
became convinced that alkaptonuria, and some other 
metabolic disorders, were each controlled by a single 
inherited factor. Mendel’s work was in the process of 
being rediscovered in Garrod’s time (Chapter 1), and 
it was acceptable to think of inherited traits in terms 
of Mendelian “markers” (or genes, as they would later 
be called). Garrod concluded that certain diseases 
and disorders could be inherited much as flower color 
was inherited in pea plants. He further postulated, 
quite remarkably for his time, that a genetic disease is 
caused by a change in an ancestor’s genetic material, 
and the defect is passed along within the family. 

Although Garrod lacked a background in biochemis- 
try, he suggested that the cells in people with alkaptonu- 
ria could not break down alkapton because they lacked 
the necessary biochemical enzyme. Thus, the patient 
with alkaptonuria could have an enzyme defect, and if 
that defect was inherited, it followed that a gene defect 
could cause an inherited enzyme deficiency. The con- 
cept of an “enzyme” was also relatively new in the early 
1900s, and relating a chemical reaction to an enzyme 
put one at the forefront of science. Other scientists 
accepted Garrod’s insight, and the concept gradually 
developed that genes have something to do with enzyme 
production. This concept would be greatly strengthened 
by the work of Beadle and Tatum, but the world would 
wait 40 years for their work to be performed. 


In the early twentieth century, Garrod noted that some dis- 
eases that “run” in families were inherited in patterns similar 
to ordinary traits. He also recognized for the first time that a 
defect in an enzyme could cause an inherited disease. 


Aa] 


George Beadle 


Edward Tatum 


FIGURE 3.4 Beadle and Tatum linked genes and enzymes. George 
Beadle and Edward Tatum postulated the one gene-one enzyme the- 
ory that related gene activity to protein synthesis. 


In the early 1940s, while Avery and his colleagues 
were attempting to identify the role of DNA as the 
transforming principle in bacteria (Chapter 1), other 
researchers were trying to clarify the functions of 
chromosomes. Among the leaders in chromosome 
research were George Beadle and Edward Tatum 
at Stanford University (Figure 3.4). The Beadle and 
Tatum experiments were innovative and imaginative, 
especially the now famous “one gene-one enzyme” 
experiment. Beadle and Tatum wanted to find out what 
would happen to the biochemistry of an organism if 
they mutated an organism’s chromosome DNA. For 
these experiments they selected the common bread 
mold Neurospora crassa (Figure 3.5). At the time, 


Chapter | 3 DNA in Action 


(A) 


(B) 


FIGURE 3.5 Neurospora crassa. (A) Neurospora crassa is a red- 
dish bread mold. (B) Fluorescence microscopy of neurospora cells 
expressing histone proteins with bright green fluorescent markers. 
Histones are proteins that package DNA into chromosomes in the 
nucleus. 


Neurospora cells were among the few types of cells 
that could be grown in the laboratory on a defined 
growth medium where the exact nature and amount of 
each nutrient is controlled (for example, the medium 
might contain precisely 10g of glucose per liter). 
Beadle and Tatum knew that the ability to control the 
growth conditions of the cells by changing the com- 
position of the medium was essential for the experi- 
ment. They needed what geneticists now call a genetic 
screen, a way to find rare Neurospora cells that could 
not grow without certain nutrients supplied in the 
medium. 

The genetic screen used in the Beadle and Tatum 
experiment is most easily understood by following the 
protocol they used, shown in Figure 3.6. First a culture 
of Neurospora cells was exposed to x-rays to cause 
mutations in the genome DNA sequences. Then the 
culture was split and half was transferred into “mini- 
mal medium,” that is, a growth medium lacking cer- 
tain specific nutrients (called nutrients A, B, C, and D 
in Figure 3.6). The other half of the cell culture was 
transferred into “enriched medium” containing all the 
nutrients. The scientists observed that the irradiated 
cells did not grow in the minimal medium or in mini- 
mal medium with nutrient A or B added. Importantly, 
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FIGURE 3.6 Beadle-Tatum: the one gene-one enzyme experiment. 
(1) Neurospora cells were exposed to x-rays to cause mutations. 
Then the culture was split and transferred into (2) minimal medium 
lacking certain nutrients and (3) enriched medium containing all 
nutrients. The cells did not grow on the minimal medium, or on (4) 
minimal medium with nutrients A or B added. (5) Cells grew only on 
minimal medium containing added nutrients C and D. (6) The posi- 
tions of nutrients A-D in the metabolic pathway indicates that the 
cells must be missing enzyme 2 from the pathway, leading to the 
conclusion that the mutation damaged enzyme 2 in pathway. 


the cells did grow on minimal medium with nutrient C 
or D added. Beadle and Tatum concluded that because 
the cells grew only when nutrients C or D were added 
to the medium, it followed that the cells must be miss- 
ing enzyme 2 in the biochemical pathway (Figure 3.6). 
This meant that in these rare cells, the x-rays most 
likely damaged the DNA that encodes enzyme 2. This 
established the first experimental connection between 
a specific site in the DNA genome and a specific pro- 
tein product. 

Beadle and Tatum used haploid Neurospora cells 
in their experiment; each nucleus contained only 
one copy of each chromosome (and hence one copy 
of each gene). When haploid cells are treated with a 
mutagen, which causes mutations in DNA, a change 
in phenotype can be observed when one copy of the 
gene is mutated, because haploid cells do not carry 
a second copy of any gene. Diploid cells, which 
each contain two copies of every chromosome (and 
hence, two copies of each gene), decrease the odds 
for success because the mutagenesis treatment often 
must disable both copies of the target gene for the 
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phenotype of the cell to be visibly affected by the 
mutation. Most mutations are silent or recessive in dip- 
loid cells because the single remaining normal gene is 
sufficient to do the work in the absence of the second 
gene product. Less often the mutant gene has a domi- 
nant phenotype, which means that the mutant gene 
function overrides the normal gene function and the 
diploid cells are visibly affected by the mutation (see 
Chapter 10). 

Through their experiments, Beadle and Tatum 
noted that mutant Neurospora cells passed the muta- 
tion to the next generation, and each mutation could 
be explained as damage to a specific cellular cata- 
lyst (a specific enzyme protein). Most enzymes are 
protein molecules that carry out virtually all the bio- 
chemical reactions in a cell (a few enzymes are RNA 
molecules or protein-RNA complexes). The key fea- 
ture of enzymes is that they speed up biological reac- 
tions, so that an entire biochemical reaction occurs in 
a fraction of the time needed to complete the reaction 
without enzyme catalysis. Additionally, the chemical 
structure of the enzyme does not change as a result 
of the reaction, so the same enzyme will catalyze the 
same reaction repeatedly. Beadle and Tatum’s work 
indicated that a mutation in a cell’s DNA, is correlated 
with the inability of the cell to produce an important 
growth substance. This missing substance could be 
traced to a defect in a specific enzyme that catalyzes 
the reaction for synthesis of the substance. 

Two major concepts came from Beadle and Tatum’s 
work: 


e Mutations can be traced to the hereditary constitu- 
ents in the cell (genes). 

e DNA in genes has something to do with the cellu- 
lar production of key enzymes. 


Beadle and Tatum came up with the idea that each 
gene in a cell influences the production of a cellular 
enzyme: the “one gene—one enzyme hypothesis.” They 
proposed that altering one enzyme would in turn dis- 
rupt at least one event in a series of chemical reactions 
in one or more biochemical pathways in the cell. 


George Beadle and Edward Tatum found that when the 
chromosomes in the Neurospora cells were damaged, 
the resulting cellular defects were passed on to the next 
generation of cells. In addition, they theorized that each 
chromosome defect interferes with the production of one 
enzyme. 


In the 1940s, biochemists agreed that enzymes are 
specific types of proteins and that enzymes are cata- 
lysts that drive the synthesis of all of an organism’s 
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structural parts (e.g., proteins, carbohydrates, lipids, 
nucleic acids, blood components, hormones, anti- 
bodies, hair fiber, muscle proteins, and on and on). 
It stood to reason that if genes affect the production 
of enzymes and if enzymes are proteins, then genes 
must affect protein production. This is a fundamental 
tenet of molecular biology. However, even though the 
“one gene-one enzyme” theory is critically impor- 
tant for understanding genes and inheritance, it is an 
oversimplification. 

From a historical standpoint, it is easy to understand 
why scientific interest in DNA grew during the 1940s. 
Avery’s group was in the process of identifying DNA as 
the substance of bacterial transformation, Hershey and 
Chase were working to show that DNA was the deter- 
mining molecule in viral replication (Chapter 1), and 
Beadle and Tatum were focusing attention on genes for 
their ability to transmit mutations between generations 
and for the connection between genes and the syn- 
thesis of proteins. The implications of DNA in cellular 
functions captured the imaginations of biologists, and 
deservedly so because the function of DNA stands at 
the very roots of molecular biology. 


Proteins are Linear Sequences of 
Amino Acids 


Proteins do much of the work and comprise many of 
the structures in all cells, be they microbial, plant, ani- 
mal, or other cells. Chemically, proteins are composed 
of linear chains of building-block units called amino 
acids (Figure 3.7(A). Cells produce protein chains 
of different lengths that are composed of different 
sequences (order) of the 20 amino acids found in bio- 
logical organisms (Figure 3.7B). The sequence of the 
human genome has revealed about 20,000 different 
human genes, but scientists predict that there are more 
than 100,000 different combinations of the 20 amino 
acids that make up human proteins (see Chapter 6). 
It is essential to understand that one protein is dis- 
tinguished from another protein by the sequence of 
the amino acids in the protein. In protein production 
(protein synthesis), specific amino acid sequences 
encoded in the DNA are linked together into proteins. 
Think for a moment how many different English words 
have been composed using a 26-letter alphabet. For 
proteins, the alphabet is not composed of 26 letters but 
of 20 amino acids, and the variety of possible combi- 
nations of amino acids leads to a tremendous variety 
of different proteins. 

The amino acids in a protein are linked together by 
peptide bonds made between adjacent amino acids 
(Figure 3.8). The amino acid sequence of each protein 
is unique to that specific protein. For example, insulin is 
a protein made by the cells in your pancreas to regulate 
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FIGURE 3.7 Amino acids are the building blocks of proteins. (A) Each amino acid has four chemical groups: amino, hydrogen, carboxyl, 
and R group, attached to a central or “a” carbon atom. Only the R group differs for each amino acid and confers specific chemical and phys- 
ical properties of polarity, size, and charge. The R groups on the valine and glutamate amino acids are indicated. (B) Proteins in living cells 
contain combinations of the 20 different amino acids shown. Each of the 20 amino acids has a different R group, also called a side chain, 


that confers unique properties to each amino acid as indicated. 
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FIGURE 3.8 Proteins are made by linking amino acids together with peptide bonds. During protein synthesis, the amino portion of one 
amino acid and the acid portion of another amino acid join together in a dehydration synthesis reaction to form a peptide bond. A dipeptide 
and a water molecule result from the reaction. Additions at the carboxyl (acid) end of this dipeptide can lengthen the protein. Protein chains 
are linear, never branched, and nearly all fold into a specific shape to function. 
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FIGURE 3.9 Views of the insulin protein. (A) The insulin protein is made in the pancreas as a single protein chain that is connected inter- 
nally by two disulfide bonds (yellow) which form between its cysteine amino acid side chains. Each amino acid of insulin is represented here 
as a small circle. Insulin is subsequently cleaved into three protein chains and active insulin contains only the A-chain and B-chain, con- 
nected to each other by two disulfide bonds. (B) A ribbon drawing of the cleaved insulin protein chain illustrates the three-dimensional folds 
of the protein into three helices, three turns, and a loop. The ribbon traces the path of the protein’s “backbone” in space. (C) The overall shape 
of insulin is shown in a space-filling model. All atoms are shown as solid spheres, except hydrogen atoms, which are omitted for clarity. The 
sizes of the remaining atoms are inflated slightly to make up for the missing hydrogen. The atoms are colored by element as shown. 


the amount of glucose sugar in the bloodstream. All the 
insulin proteins made by a human pancreas have the 
exact same order (sequence) of amino acids in the pro- 
tein chain; they all fold into identical three-dimensional 
shapes and participate in the same biochemical proc- 
esses and reactions in the body (Figure 3.9). 


The importance of the specific sequence of amino 
acids to the function of each protein was highlighted 
by the pioneering work of Vernon M. Ingram and his 
group at Cambridge University in 1956. Ingram dis- 
covered that a single genetic mutation in the 8 globin 
protein causes sickle cell disease (see Chapter 10). 
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FIGURE 3.10 « and 8 globin proteins make hemoglobin. (A) In humans, the gene for the 8 subunit of hemoglobin is located on chromo- 
some 11. Because humans have two copies of each chromosome, there are two 8 globin genes in the human genome. Both copies are used 
to synthesize 8 globin protein. The gene for the a subunit appears twice on chromosome 16 in humans. All four copies of the gene contribute 
to producing the a hemoglobin protein chain. Overall, equal numbers of a and B globin proteins are produced, and two of each type com- 
bine to form hemoglobin. (B) The hemoglobin is shown as a space-fill model. Heme is a small organic molecule that binds in a pocket on 
each subunit of the protein, enabling the hemoglobin to carry oxygen. 
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acids in the protein (top). The mutation in the DNA causes a change in the amino acid sequence of the beta globin protein that promotes the 
formation of long fibers of hemoglobin proteins. The hemoglobin fibers distort the normally disk-shaped red blood cells into sickle shapes, 
and as a consequence they get stuck in the small blood vessels and capillaries, causing severe disease symptoms. 


The globin proteins are part of a large multigene family 
that includes the a and 8 globin proteins (Figure 3.10), 
myoglobin, and the plant leghemoglobins; all are pro- 
teins that bind to and carry oxygen. Hemoglobin con- 
tains two a globin proteins and two 6 globin proteins 
arranged in a complex that responds to the amount 
of oxygen in the environment by changing its shape, 
which affects the ability of hemoglobin to bind to oxy- 
gen (see Figure 3.10). 

Ingram’s experiments indicated that sickle cell dis- 
ease is caused by one incorrect amino acid out of the 


146 amino acids in the beta globin protein (Figure 3.11). 
The replacement of the charged (highly polar) glutamic 
acid with a nonpolar valine residue causes the mutant 
hemoglobin complexes to form long fibers that physi- 
cally distort the cells into a crescent or sickle shape. 
The sickle-shaped red blood cells cannot move freely 
through the smaller blood vessels and capillaries. This 
interferes with oxygen transport to the cells, result- 
ing in painful symptoms and, sometimes, early death. 
At the time Ingram began his work, sickle cell disease 
was known to be inherited, but Ingram’s research took 
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science a step further by showing that a gene defective 
in its encoding of a single amino acid causes a devastat- 
ing disease. 


Genes encode the sequences of amino acids in protein 
chains. Ingram’s research pointed to the possibility that a 
devastating disease could be the result of single incorrect 
amino acid in an entire protein. 


Scientists Link Genes with Amino Acid 
Sequence of a Protein 


One of the first to suggest a cellular coding mecha- 
nism for proteins was the German physicist Erwin 
Schrödinger. In his 1944 book What Is Life?, 
Schrödinger suggested that the cellular chromosomes 
might contain a “code-script.” He proposed that vari- 
ations of atoms in the chromosome could produce a 
Morse code-like packet of information. The prevail- 
ing wisdom of the day was that chromosomes were 
composed of protein, and so DNA did not enter 
Schrédinger’s thinking; however, the notion of a code 
embedded in the arrangement of atoms in chromo- 
somes was an important step. 

Shortly after the 1953 announcement of the 
Watson-Crick model of DNA (Chapter 2), another 
physicist named George Gamow wrote to Watson 
and Crick and proposed that the arrangement of bases 
along the double helix might produce a template of 
sorts, a series of chemical “holes.” Gamow thought 
the holes could have different structures, and differ- 
ent amino acids could then fit into different holes and 
link up to form a protein. In this model, DNA acted 
as a direct, physical template for building protein 
chains. Ironically, Gamow also became very interested 
in ribonucleic acid (RNA), and in 1954 he organized 
the first “RNA Club” to investigate the role of RNA in 
protein synthesis. Members wore a tie with a sinuous 
lime-green curl of nucleic acid flanked by boxy yellow 
outlines of purines and pyrimidines on a black back- 
ground. 


Gamow’s model correlated the order of DNA bases (the 
DNA sequence) with the order of the amino acids in a pro- 
tein, but it required physical contact between the DNA and 
protein. 


As the 1950s progressed, evidence continued to 
accumulate that supported DNA as the material of 
heredity and the idea that the genetic message in DNA 
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is expressed through proteins. Many scientists also 
supported the concept that some sort of genetic code 
exists in DNA and that the code actually specifies 
the sequence of amino acids in a protein. Eventually 
the questions about the genetic code were answered, 
but only after considerable experimentation and edu- 
cated guesswork provided a much better grasp of the 
biochemical realities of living things, starting with the 
components inside cells. 

To understand the process of protein synthesis, bio- 
chemists had to know whether the chemical informa- 
tion in DNA passes directly from the nucleus to the 
site of protein synthesis or whether some intermedi- 
ary substance is required to transfer the information 
from DNA to protein. Recall that because prokaryotes 
do not have nuclei, transcription and translation take 
place in the same cellular compartment in bacterial 
cells. However, in eukaryotic cells, protein synthe- 
sis takes place in the cytoplasm, whereas the chro- 
mosomes are confined to the nucleus (mitochondria 
each contain a DNA genome). In eukaryotic cells, the 
direct passage of information between the two sepa- 
rate cellular compartments seemed unlikely—that is, 
it seemed improbable that the DNA itself moved out 
of the nucleus. A better possibility, scientists began to 
think, was a relay system operating to transfer genetic 
information between the nucleus and cytoplasm. In the 
1940s, biochemists found that cells and organs under- 
going protein synthesis made unusually large amounts 
of RNA, a chemical cousin of DNA. This correlation 
between protein synthesis and RNA synthesis, as well 
as several other lines of evidence, suggested that RNA 
is the chemical intermediary that transfers genetic 
information from the DNA in the nucleus to the pro- 
tein made in the cytoplasm. 

Although RNA and DNA are both nucleic acids 
with similar molecular structures (Figure 3.12), the 
molecules differ in three important ways: 


e The sugar in the backbone of RNA is ribose; in 
DNA it is deoxyribose. 

e RNA has the base uracil, whereas DNA has 
thymine. 

e An RNA molecule is usually single-stranded, 
and base-pairs with itself; DNA usually has two 
DNA strands that are base-paired to each other 
(double-stranded). 


Despite the chemical differences between DNA and 
RNA, it is now clear that base pairing is an essential 
feature of both the DNA and RNA molecules. DNA 
base pairing establishes the double-stranded DNA 
helix, whereas RNA is typically single-stranded, but 
contains many regions where the RNA strand base- 
pairs with itself, creating loops and hairpin structures 
(Figure 3.13). 
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FIGURE 3.12 RNA and DNA are chemical cousins. Though closely related, DNA and RNA have important differences. (A) RNA has uracil 
(U) instead of thymine (T) nucleotides; DNA is typically double stranded in the cell, whereas RNA is single stranded but usually base pairs 
with itself. (B) Ribose, the sugar in RNA, has a hydroxyl (-OH) group on the 2’ carbon, whereas deoxyribose in DNA has a hydrogen at the 2’ 


carbon. This makes RNA more reactive and less stable than DNA. 


Each DNA base sequence is a template for RNA 
synthesis, much as DNA acts as a template for the 
synthesis of new DNA during replication (Chapter 2). 
Biochemists proposed that if the synthesis of RNA from 
DNA occurs in the nucleus, then the RNA copy could 
travel to the cytoplasm, where it would convey the 
amino acid sequence information for the synthesis of a 
protein. Indeed, radioactive isotope tracer experiments 
in eukaryotic cells showed that RNA molecules are 
made in the nucleus and then move from the nucleus 
to the cytoplasm. 

Important evidence supporting the role of RNA in 
information transfer in the cell came from experiments 
with viruses. Bacteriophages are viruses that infect 
bacterial cells (but do not attack eukaryotic cells). 
Researchers found that when bacteria are infected 
by bacteriophages, the DNA genomes enter the cells 


(Figure 3.14). The bacteria synthesize the bacteriophage 
RNA before they begin synthesizing protein. Therefore, 
RNA appeared to be the intermediary compound 
between DNA and protein. Moreover, infections with 
the tobacco mosaic virus (TMV), which has no DNA— 
its small genome is entirely RNA (Figure 3.15)—causes 
infected tobacco leaf cells to produce viral proteins. 
Scientists reasoned that the TMV RNA genome carries 
the information needed to synthesize the viral proteins 
without requiring DNA. The functional links between 
DNA, RNA, and protein began to emerge. 


Mounting evidence suggested that RNA molecules trans- 
fer genetic instructions from DNA to the protein synthesis 
machinery of the cell. 
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FIGURE 3.13 RNA is single stranded but easily forms base pairs with itself. (A) The four nucleotide bases form the backbone of an RNA 
molecule. (B) Diagrammatic representation of a longer RNA molecule forming a stem-loop structure. 


FRANCIS CRICK STARTS TO UNRAVEL THE 
GENETIC CODE 


At the start of the 1960s, the evidence that DNA car- 
ries genetic information was very compelling. By then, 
biochemists knew about the DNA double helix; they 
had learned that cells linked amino acids together to 
make protein chains, and the concept that genetic 
information “flowed” in the cell from DNA to RNA to 
protein was widespread. Still, fundamental questions 
remained: How is the genetic message in DNA trans- 
mitted to RNA, and how is the RNA carrier involved 
in protein synthesis? More specifically, how does the 
order of bases in the DNA encode the order of amino 
acids in the corresponding protein? 

Answers to these questions came from a series of 
elegant experiments reported in 1961 by Francis Crick 
and his colleagues. They reasoned that the simplest 
genetic code in DNA probably consists of a series of 
blocks of chemical information, with each block hav- 
ing a linear relationship with one amino acid in the 
protein. In addition, they thought that a block of three 
DNA bases would be sufficient to universally spec- 
ify an amino acid. To test these ideas, Crick’s group 
performed experiments on viruses containing DNA 
genomes. They altered the DNA sequence of the viral 
genome, then infected bacterial cells and asked what 
would happen to the virus, the viral protein products, 
and the cells. 


The experiments were simple yet ingenious. Crick’s 
group altered the viral DNA using acridine com- 
pounds, which inserts or deletes a single base pair 
in a DNA strand. The Crick group sought to find out 
whether deleting one DNA base would change the 
amino acids in the protein synthesized. For example, 
let’s start with the following DNA sequence: 


AGGCATGCAATG (parent DNA) 


After acridine treatment, this sequence might have 
been changed, for example, to 


AGGATGCAATG (mutant DNA) 


In this case, the fourth base (C) was deleted. 

If the genetic code were read in blocks of three 
DNA bases (Figure 3.16A), the first block of the pre- 
ceding sequence would remain unchanged, but the 
succeeding blocks of three would be different: 


AGG CAT GCA TTG (parent DNA) 


AGG ATG CAT TG (mutant DNA) 


The result of this mutation is demonstrated because the 
first amino acid would appear in its proper position in 
the mutant protein, but the subsequent amino acids 
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FIGURE 3.14 Bacteriophage T4 infects bacterial cells. (A) Diagram of T4 phage, including capsid (icosahedral head), tail proteins, base- 
plate, and tail fibers. (B) Scanning electron micrograph of a T4 bacteriophage virus. (C) The T4 DNA is in the phage head and is injected 
through the cell membrane and into the bacterial cell. (D) An electron micrograph of an infected bacterial cell producing many T4 bacteri- 


ophages offspring. 


in the mutant protein would change due to the new 
blocks of three nucleotides generated by the mutation. 
Indeed, in Crick’s experiments, the first amino acid 
occurred in its correct position in the protein, but all 
succeeding amino acids were different. 

Next, Crick and his colleagues deleted two DNA 
bases, predicting that the first block would remain the 
same, but the subsequent blocks would change yet 
again: 


AGG CAT GCA TTG —> (delete bases 4 and 5, C and A) 
— AGG TGC ATT G 


As you can see, deleting two bases at a fixed point 
changes the three-base blocks after the deletion. In the 
protein, the amino acid encoded by AGG occurs in the 
correct position, but all succeeding amino acids are 
different because all the following blocks (e.g., TGC 
and ATT) are different. 

Finally, the key experiment was performed. Crick 
and his colleagues removed three bases to observe the 
effect on the amino acids in the protein: 


AGG CAT GCA TTG —> (delete C, A, T) 
— AGG GCA TTG 
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FIGURE 3.15 Tobacco mosaic viruses have RNA genomes. TMV research provided evidence that RNA has a central role in protein synthe- 
sis. (A) Electron micrograph of tobacco mosaic viruses (TMV). (B) TMV has a thin, rodlike shape, defined by a coat made up of repeated pro- 


tein units that assemble around the single-stranded RNA genome. 


Deletion of the fourth, fifth, and sixth bases together 
(CAT) has quite a different result. This mutant protein is 
more similar to the original protein because it is missing 
only the second amino acid, and the other amino acids 
in the mutant protein are correct. This is a key experi- 
ment because it shows that the reading of the code was 
restored when three bases were removed. Crick’s group 
went on to show that this was the case whenever bases 
were added or removed from DNA in multiples of three. 
Adding or removing other numbers of bases resulted 
in wholesale changes of the amino acid sequence of 
the protein after the site of the mutation. Crick’s team 
deduced that the genetic code is read in sets of three 
consecutive bases, triplets, later termed codons. 


By testing mutations that eliminated bases one at a time 
and in multiples and observing the mutant proteins made, 
Crick’s team established that a nucleotide block of three 
bases corresponds to a single amino acid. 


The Genetic Code is Cracked Using 
Synthetic RNA 


Additional elegant experiments in 1962, reported by 
Marshall Nirenberg with Heinrich Matthaei, and inde- 
pendently by Har Gobind Khorana, were designed to 
determine the genetic codons for all 20 amino acids. 
To determine which mRNA codons call for which 
amino acids, the biochemists combined in vitro (out- 
side the cell) different individual synthetic RNA mol- 
ecules with enzymes, amino acids, and other essential 
compounds to see what protein would be made in the 
test tube (Figure 3.16B). They found that the synthetic 
RNA molecule consisting of only uracil residues (U-U- 
U-U-U-U-U-U-U) produced a protein consisting of just 
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FIGURE 3.16 Codons deduced for all 20 amino acids. (A) The mes- 
sage in (1) is written in three-letter English words reading from left to 
right. Each three-letter word has meaning in the message. In (2), we 
have a biochemical message written in three-letter RNA bases read- 
ing from left to right. Each three-letter “word” (or codon) specifies 
an amino acid in the protein with an AUG start codon (not shown) 
and at the far right is a stop signal. (B) The experiment that broke 
the genetic code for the amino acid phenylalanine. Researchers 
synthesized a strand of RNA containing only uracil (U-U-U-U-U-U- 
U-U...) and placed it into a test tube with 20 different amino acids 
and all the materials needed to synthesize protein. The system gen- 
erated a protein consisting only of phenylalanines. Thus, from these 
results scientists knew that the codon UUU specifies the amino acid 
phenylalanine. 
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FIGURE 3.17 Genetic code table. This chart shows how the three-let- 
ter mRNA codons correspond to specific amino acids. For example, to 


look up the amino acid for the codon AGU, the first base in the codon 
is used to select a row in the far left column of the table (row A). The 
second base is used to select a column from the top of the table (col- 
umn G). From within the box at the intersection of the selected row 
and the column, use the third codon base (U) in the final column of 
the table to select the amino acid. The amino acid is identified as serine 
(Ser). The AGU codon specifies serine in the protein. 


the amino acid, phenylalanine. From this result, scien- 
tists surmised that the RNA three-letter code word, or 
codon, for phenylalanine is U-U-U. Working back- 
ward, scientists reasoned that the DNA codon for phe- 
nylalanine is A-A-A. Similar experiments rapidly led to 
the identification of RNA codons for all 20 amino acids 
and showed that many amino acids are encoded by 
more than one codon (Figure 3.17). The codons UAG, 
UGA, and UAA are translation “stop” signals. For their 
work in determining the chemical nature of the genetic 
code, Nirenberg and Khorana were awarded the 1968 
Nobel Prize in physiology or medicine. 

In the ensuing years, biochemists demonstrated that 
the genetic code is virtually universal: the same three- 
base codons specify the same amino acids regardless 
of whether the codons reside in a bacterium, bee, but- 
tercup, or bear. The essential differences among species 
of organisms is not the nature of the DNA bases but the 
order in which the bases occur in the DNA molecules 
or, in the case of proteins, the order of the amino acids. 


The three-letter genetic code is universal and is used to 
specify proteins in all life on earth. This seemingly simple 
system allows for all the consistency and all the biological 
diversity on our planet! 


DNA sequences are copied into RNA sequences, 
and the order of the bases in the RNA transcript speci- 
fies the sequence of the amino acids in the protein 
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product. These events reflect the core element of the 
so-called central dogma of molecular biology, which 
encompasses the major gene expression pathways in 
living cells (see Figure 3.18). 

The work of many scientists over much of the twen- 
tieth century elucidated the relationship between chro- 
mosomes and proteins, as well as the genetic code that 
enables DNA to specify the proteins made by a given 
cell. The chemically similar polymer, RNA, acts as 
an intermediary and carries an imprint of the genetic 
information to be processed into a protein. Next, we 
will explore how the information travels from DNA 
to RNA and is then used to synthesize proteins—the 
process of gene expression. 


GENE EXPRESSION 


Genomes vary in size from just under 500 genes in the 
smallest known genome to more than 45,000 genes in 
the largest genome known in 2009, that of the black 
cottonwood tree. No matter the size of the genome, 
the genes in the genome are not all “turned on,” or 
expressed, at the same time. Genes are turned on and 
off as needed, by signals from the external environ- 
ment as well as signals from within the organism. In 
multicellular organisms, the pattern of gene expression 
determines cell type. For example, a neuron is different 
from a skin cell by virtue of the genes that have been 
expressed in its development and that are expressed as 
it performs its specialized functions. 

The control of gene expression is a complex and intri- 
cate business. It involves adjusting the level as well as 
the timing of gene activation and silencing (repression) in 
different cells. Control sequences in both DNA and RNA 
are “read” by regulatory proteins and RNA molecules that 
act in response to internal and external signals and turn 
genes on and off. The primary level of control over gene 
expression is at the level of transcription, a word coined 
by Francis Crick in 1956 to refer to the process of copy- 
ing DNA sequences into RNA. The product of the tran- 
scription of a gene is a single-stranded RNA molecule. 

Differences in cell structure, genomic structure, 
and complexity between prokaryotes and eukaryotes 
dictate that they exhibit important differences in gene 
expression pathways. Eukaryotic cells have nuclei, 
and eukaryotic genes usually contain intron and exon 
sequences. These differences have a fundamental 
impact on gene expression. 


Life demands that organisms respond to changing conditions. 
At the molecular level, these responses are often made via 
changes in gene expression. Differences in cell structure and 
complexity lead to major differences in how gene expression 
occurs in prokaryotic and eukaryotic cells. 
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Box 3.1 


Central Dogma: Misnamed, Blamed, and Framed 
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The term “central dogma of molecular biology” was coined 
in 1958 by Francis Crick, one of the discoverers of the DNA 
double helix. The central dogma, often stated as “DNA makes 
RNA makes protein,” for many years was the only known 
pathway describing gene (DNA) expression. Now, many, 
many experiments later, we have learned that there are many 
exceptions to the central dogma; some organisms have RNA 
chromosomes, and some copy RNA into DNA. This is how 
the HIV virus causes AIDS; once HIV enters a human cell, the 
HIV RNA genome is copied into DNA by HIV’s highly special- 
ized reverse transcriptase enzyme. The DNA copied from the 
RNA genome becomes inserted (integrated) into the host cell 
DNA genome, and the HIV DNA becomes a permanent part 
of the human cell’s chromosome. It became routine for scien- 
tists to discover new exceptions to the central dogma rules! 
As a result of such discoveries, articles with titles such as “The 
Death of the Central Dogma” and “Dog Eat Dogma” proclaim 
the central dogma to be useless, and some say even harmful. 
They argue, why are we still teaching and modeling research 
on something as outmoded as this concept? And isn’t science 
supposed to be the antithesis of dogma, which can be thought 
of as an authoritative proclamation to be generally accepted 
without proof? 

Author Horace Freeland Judson asked Francis Crick about 
the meaning of “central dogma.” Crick explained, “My mind 
was that a dogma was an idea for which there was no rea- 
sonable evidence. You see?!” And Crick gave a roar of delight. 
“I just didn’t know what dogma meant. And | could just as 
well have called it the ‘Central Hypothesis’... which is what 


Several Types of RNA are Transcribed 
from DNA 


The major gene expression pathway in cells begins 
with transcription, copying DNA into RNA (Figure 
3.19). Cells transcribe several types of RNAs, which 
fall into to two broad categories: RNA that encodes 
proteins (messenger RNA, mRNA) and RNA that does 
not encode proteins (noncoding RNA, ncRNA). Two 
ncRNAs play direct roles in protein synthesis. Transfer 
RNA (tRNA) molecules ferry amino acids to the pro- 
tein synthesizing machinery, the ribosomes, and ribos- 
omal RNA (rRNA) provides major functional and 
structural aspects of ribosomes (Figure 3.20). In addi- 
tion, a variety of other ncRNAs perform essential func- 
tions including the chemical modification of other 
RNA molecules, RNA splicing, and the regulation of 
gene expression. Because RNA is single stranded, it 
will form stable base-paired structures if its bases are 
unimpeded from interaction with each other. In par- 
ticular, ncRNAs form three-dimensional RNA struc- 
tures that carry out important cell functions. Although 


| meant to say. Dogma was just a catch phrase.” So actually, 
Crick’s use of the word must be taken in the context revealed 
through his own words. In 1970, Crick, criticized in light 
of reports that claimed to refute the central dogma, wrote a 
paper in the journal Nature that clarified his meaning, and 
helped people to better visualize the possible paths used to 
transfer genetic information between DNA, RNA, and protein 
(Figure 3.18). The central dogma rules out some of the pos- 
sible pathways (removes arrows), and suggests that other gene 
expression pathways only occur under special circumstances. 
Still unchanged, however, is the idea that once an organism 
or cell has expressed its genetic information at the protein 
level, the genetic information in the protein cannot be sent in 
reverse into RNA or DNA. 
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FIGURE 3.18 The meaning of the “central dogma” of molecu- 
lar biology. (A) The many theoretical paths for genetic informa- 
tion to pass between DNA, RNA, and protein. (B) The central 
dogma rules out some of the possible pathways (removes arrows) 
entirely and suggests that other gene expression pathways only 
occur under special circumstances or in certain organisms (dot- 
ted lines). 


mRNAs sometimes form base-paired stems and loop 
structures, the main function of mRNAs is to carry 
sequence information to the ribosomes. 


mRNA Encodes Protein 


The RNA sequence of an mRNA carries the informa- 
tion that will be decoded into a chain of amino acids 
during protein synthesis on ribosomes in the cell. In 
prokaryotes, MRNAs often contain the coding regions 
for several proteins, one after another, and are imme- 
diately used to synthesize proteins, even while the 
mRNA is still being transcribed from the DNA gene. 
In eukaryotes, mRNA is synthesized in the nucleus as 
a long precursor RNA strand that is processed exten- 
sively before it leaves the nucleus to be translated into 
protein in the cytoplasm. 


Ribosomal RNA 


rRNA provides essential structural and enzymatic func- 
tions in the protein synthesis machine, the ribosome 
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FIGURE 3.19 Gene expression pathway. (A) Overview: Gene expression begins with transcription, the process of synthesizing an RNA copy 
from a sequence of DNA. Noncoding RNAs, such as tRNA and rRNA, fold into functional molecules (not shown); mRNA, which encodes 
proteins, is used as a template by a ribosome to synthesize a protein chain from amino acid building blocks. In eukaryotes, the mRNA 
is transported out of the nucleus and into the cytoplasm where translation occurs. (B) Transcription: Strands of DNA separate as the RNA 
polymerase enzyme copies the DNA bases into an mRNA strand. The sequence of the RNA is determined by copying and base-pairing to the 


DNA template. 


FIGURE 3.20 Ribosome and ribosomal RNA structure. (A) Three-dimensional structure of a bacterial ribosome. Proteins (grey, blue) and 
rRNAs (purple) come together to make the protein synthesis machinery shown here. Prokaryotic ribosomes are made of more than 50 pro- 
teins and three rRNAs. (B) Close-up of a small part of the ribosome: three-dimensional structure of an rRNA fragment and ribosomal protein. 
This small fragment of the 16S rRNA (blue backbone, gray bases) illustrates how its folding pattern allows it to bind closely with a ribosomal 


protein (magenta). 


(Figure 3.20). The critical functions of rRNA are 
reflected in the fact that rRNA genes are the most 
highly conserved sequences (least varied) in evolu- 
tion. Ribosomes in both prokaryotes and eukaryotes 
are formed from two subunits, each of which consists 
of many proteins and RNAs. The rRNA components 
provide the enzyme activity that catalyzes the creation 
of the peptide bond between amino acids during pro- 
tein synthesis, and they interact with mRNA, enabling 
it to bind to the ribosome. Prokaryotic ribosomes are 


smaller than eukaryotic ribosomes and contain a total 
of three rRNAs, whereas eukaryotic ribosomes contain 
four rRNAs. 


Transfer RNA 


tRNAs (Figure 3.21) are adaptor molecules that deliver 
the amino acid corresponding to a specific mRNA 
codon while the mRNA is associated with the ribos- 
ome. During protein synthesis, the bases in the tRNA 
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FIGURE 3.21 Transfer RNA structure. 
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A) The characteristic base-pairing pattern or secondary structure of all tRNAs resembles a cloverleaf. 


(B) The three-dimensional folding of ne molecules. At the 3’ (upper right) end of the molecule is the amino acid attachment site. At the 
lower portion of the molecule contains the tRNA anticodon, shown base paired to the mRNA codon. (C) A space-filling model of the tRNA 


molecule shown in the same orientation as (B). 
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FIGURE 3.22 Role of tRNAs in translation. An amino acid is 
attached to the acceptor stem of the appropriate tRNA determined 
by the anticodon sequence. Once inside the ribosome, the anti- 
codon in the tRNA base-pairs with the codon in the mRNA being 
translated. 


anticodon pair with complementary bases in the codon 
of the mRNA strand (Figure 3.22) bringing the correct 
amino acid into position to be added to the growing 
protein chain in the ribosome. These codon-anticodon 
interactions translate the genetic code in the DNA into 
an amino acid sequence in the protein. 


Other Non-coding RNA 


Many types of ncRNAs perform a wide variety of 
functions in eukaryotes. These include micro RNAs 
(miRNA) and small interfering RNAs (siRNA), which 
are important in controlling gene expression, small 
nuclear RNAs (snRNAs) that are involved in process- 
ing MRNA, and small nucleolar RNAs (snoRNAs) that 
help to modify the structures of certain bases in rRNA 


and tRNA molecules. Prokaryotes have fewer kinds of 
ncRNAs, some of which modulate gene expression. 


The three most well known types of RNA—mRNA, tRNA, 
and rRNA—are involved directly in protein synthesis. 
Various other noncoding RNAs have important roles in the 
control of gene expression and chemical modification of 
other RNA molecules. 


RNA POLYMERASES COPY DNA INTO RNA 


RNA polymerase (RNAP) enzymes synthesize all 
the different types of RNA transcripts made in both 
eukaryotic and prokaryotic cells. These enzymes are 
large multiprotein complexes that range in size from 
100kDa (kilo Daltons) in T7 bacteriophage, to 400 kDa 
in bacteria (Figure 3.23) and 500kDa in eukaryotes. 
Like the DNA polymerases that replicate DNA, RNA 
polymerases catalyze the addition of single nucle- 
otides to a growing chain in the 5’ to 3’ direction, but 
RNAPs make RNA copies that are composed of ribo- 
nucleotides instead of deoxyribonucleotides, which 
are found in DNA. 

The E. coli RNA polymerase contains a core set 
of protein subunits and one variable protein subunit, 
called the sigma factor. All the different forms of the 
sigma protein fit into the core enzyme and enable it 
to start transcription, but each sigma factor allows the 
polymerase to transcribe different sets of genes. The 
bacterial RNA polymerase enzyme copies DNA into 
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RNA at about 50 bases per second and makes a mistake 
in only about 1 in 10,000 bases added. A proofreading 
mechanism is built into the enzyme complex, allow- 
ing it to detect and remove an incorrectly added 
nucleotide. 

In eukaryotes, transcription is necessarily more 
complex because of the larger number of genes to 
be expressed, and the increased gene regulation 
required to control the expression of thousands of 
genes in hundreds of different types of cells. This 
complexity is reflected in the three different RNA 
polymerase enzymes that transcribe different types 
of RNA molecules in eukaryotic cells. These large 
enzyme complexes each have a core of protein sub- 
units, five common additional protein subunits, and a 
variable number of specific subunits. Different com- 
binations of conserved and unique subunits allow the 
eukaryotic RNAP enzymes to transcribe different sets 
of genes. For example, RNAPI transcribes most rRNA 
genes whereas RNAPII transcribes most protein genes 
(Table 3.1). 


FIGURE 3.23 A bacterial RNA polymerase. The polymerase 
enzyme has channels for the entry of the template and nontemplate 
strands of the DNA helix, and the emergence of the mRNA. 


TABLE 3.1 Eukaryotic RNA Polymerases 
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The specific types of RNA transcribed by the three 
different eukaryotic RNA polymerases were elucidated 
using an inhibitor of transcription called a-amanitin, a 
toxin derived from deadly mushrooms that blocks the 
function of some eukaryotic RNA polymerases; each 
year more than a hundred people are fatally poisoned 
by eating mushrooms containing a-amanitin, which 
blocks transcription. However, a-amanitin is useful in 
the laboratory to distinguish between the actions of 
the three polymerases. Each enzyme responds differ- 
ently when a-amanitin is added to the cell. The toxin 
binds tightly to RNA polymerase II, which immediately 
blocks mRNA transcription by that enzyme. Higher 
concentrations of a-amanitin are required to inhibit 
RNA polymerase Ill function, whereas RNA polymer- 
ase | activity is unaffected by the a-amanitin toxin. 
Using the differential sensitivity of RNAP enzymes 
to a-amanitin, scientists developed assays to reveal 
the functional connections between the three RNA 
polymerase enzymes and the many types of RNA gene 
products. 

RNA polymerase I (RNAP |) synthesizes the eukary- 
otic rRNAs, 5.8S, 18S, and 28S but not the 5S rRNA. 
The rRNA genes exist in tandem repeats on eukaryo- 
tic chromosomes, and are rapidly transcribed by many 
RNAP | enzymes, as shown in electron micrograph 
images of the rDNA genes that display dramatic arrays 
of progressively longer rRNA transcripts that form a 
characteristic Christmas tree structure (Figure 3.24). 
The image also demonstrates that many RNA polymer- 
ases can transcribe a gene one following right after 
another. This high level of rRNA production allows the 
cell to respond rapidly to the need for new proteins by 
making the numerous ribosomes necessary to do the 
protein synthesis work. RNA polymerase II (RNAP II) 
synthesizes mRNA from all genes encoding proteins, 
and also makes many ncRNAs. RNA polymerase III 
transcribes the transfer RNAs, the 5S rRNA needed for 
ribosomes and some of the snRNAs. Because RNAP 
Il transcribes more protein coding genes than RNAP 
| or Ill, we will focus on transcription by RNAP II in 
eukaryotic cells. 


Type Location Cellular RNAs transcribed a-amanitin 
| Nucleolus 18S, 5.8S, and 28S rRNA Insensitive 
ll Nucleus mRNA, most snRNA, miRNA, siRNA, Inhibited by low concentrations 
snoRNA 
Ul Nucleus tRNA and 5S rRNA, some snRNA and Inhibited by high concentrations 


other ncRNAs 


From www.ncbi.nlm.nih.gov/books/bv.fcgi?rid = stryer.table.3982. Biochemistry, 5th ed., 2002, by Jeremy M. Berg, John L. 
Tymoczko, Lubert Stryer, and Neil D. Clarke, WH Freeman & Co, New York. Further adapted by FR with information from 
Molecular Biology of the Cell, Alberts, 2008, Garland Science, New York. 
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RNA polymerases are large, multisubunit protein enzymes 
that synthesize RNA molecules from DNA templates. 
Prokaryotes have a single RNA polymerase to transcribe 
genes, whereas eukaryotic cells have three RNAPs. 


Genes Contain “Start” and “Stop” Signals 


Genes are not actively transcribed into RNA at all 
times; it is critical for cells to respond to signals that 
indicate which genes are required for the needs of the 
organism at a given time. For example, in single-celled 
organisms, these signals often indicate what kind of 
food source is available. Multicellular organisms must 
respond to the environment as well as to signals from 
cells and tissues. Shutting down transcription of a gene 
eliminates the need for long-term storage of the RNA 
and protein gene products in the cell. Although some 
‘housekeeping’ genes are expressed for the life span 
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FIGURE 3.24 RNA polymerase | transcribes rRNA. Eukaryotic genes 
for rRNAs are present in hundreds of identical copies distributed in 
tandem on the chromosome. When cells are growing and need large 
amounts of protein synthesis, the genes are transcribed by many RNA 
polymerases one after the other. The top panel shows three com- 
plete genes and part of a fourth. The enlargement of one of the genes 
shows the gradually increasing lengths of rRNA transcripts that are 
each emerging from an enzyme on the DNA. 
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of the organism, most genes are transcribed only for a 
specified period of time and then are turned off when 
the gene product is no longer needed. In addition, 
in some cell types in multicellular organisms, many 
genes are never turned on because they are used only 
in other cell types. 

Genes can be regulated at several steps along the 
gene expression pathway, including the start signal 
for an RNA polymerase to begin transcription of DNA 
into RNA, which is encoded in DNA located upstream 
of (preceding) the gene’s coding region, called a pro- 
moter. RNA polymerase and various proteins bind to 
this DNA region and work together to initiate tran- 
scription. The promoter DNA sequence is highly 
conserved in prokaryotic genes and includes a TATA 
sequence at —10 (the bases are counted backward 
from the first base that is transcribed into RNA at +1) 
called the TATA box and another conserved sequence 
located at about —35 (Figure 3.25). Promoters func- 
tion to attract RNA polymerase to the DNA at the 
start of a gene, which is required for transcription to 
begin. A promoter consensus sequence is created by 
comparing many promoter sequences and choosing 
the bases that appear most frequently at each posi- 
tion in the sequence (see Figure 3.25). Genes with 
promoters that closely resemble the promoter con- 
sensus sequence are transcribed more frequently. Also 
encoded in prokaryotic genes are a series of bases that 
serve as a transcription termination signal. Termination 
signals in the mRNA transcript form a stem-loop struc- 
ture that releases the mRNA from the template DNA. 
Termination can also involve the action of additional 
proteins that bind to the termination sequence and 
promote release. 

Eukaryotic RNAPII promoters are varied in DNA 
sequence, but typically encompass large regions of 
DNA located in front of the coding region of the gene. 
Some RNAPII promoters contain TATA sequences that 
are usually located within 50 bases upstream of the 
transcription start site. Most eukaryotic protein-coding 
genes also contain additional control sequences called 
enhancers that are located at large distances from the 
gene. RNAPII transcription terminates by a different 
mechanism than in prokaryotic cells. 
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FIGURE 3.25 Prokaryotic promoter recognition. The promoter represents where RNA polymerase binds to DNA to initiate transcription of a 
gene. The consensus sequences for the two regions of prokaryotic promoters are shown. The ideal distance between the two regions is 17 bp 
the exact positioning varies from promoter to promoter. Transcription starts at nucleotide number 1. Note that no promoter is identical to the 


consensus sequences. 
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Promoters are DNA sequences that indicate where tran- 
scription should start. In addition, the promoter influences 
which genes are expressed in the cell. 


Promoter selection by RNAP enzymes is a major 
mechanism used to control gene expression in both 
prokaryotic and eukaryotic cells. Exercising control 
at this initial point in the gene expression pathway 
has the advantage of conserving cellular energy and 
resources. The first level of transcriptional control 
is promoter strength, measured by the rate of suc- 
cessful transcription initiation in the absence of any 
additional regulation. Weak promoters support only 
occasional transcription, whereas strong promoters ini- 
tiate more frequent transcription events. 

In prokaryotes, different sigma factor subunits ena- 
ble RNA polymerase to recognize and bind tightly to 
specific DNA promoter regions to initiate the tran- 
scription of selected genes. Each sigma subunit ena- 
bles RNAP to transcribe sets of genes that share similar 
promoters. Various sigma factors are synthesized in 
response to changes in environmental conditions. E. 
coli, which lives in the sheltered environment of the 
gut, has eight different sigma factors that enable the 
cells to alter gene expression quickly when faced with 
conditions that cause stress or changes in metabolism. 
Many bacterial species live under less controlled con- 
ditions and have far greater numbers of sigma factors 
to respond to a wider variety of stresses. Once tran- 
scription has started, the sigma factor leaves the RNAP 
complex and RNAP works its way along the DNA, 
separating the strands and catalyzing the formation 
of the messenger RNA (mRNA) copied from the DNA 
template strand (Figure 3.26). Most prokaryotic genes 
are arranged in groups called operons that are under 
the control of a single promoter. In operons, the RNAP 
creates an MRNA copy that encodes multiple genes, 
called a polycistronic transcript. 
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Each of the three eukaryotic RNA polymerases 
interacts with a different set of transcription factor pro- 
teins that enable the RNA polymerase to transcribe the 
correct genes (Figure 3.27). In addition to general tran- 
scription factors, there are many tissue-specific tran- 
scription factors that bind to the enhancer sequences 
sometimes located thousands of bases away from the 
promoter. This observation was puzzling to scien- 
tists until it was established that the DNA helix forms 
a loop that brings the enhancer DNA region and the 
bound proteins into the proximity of RNAP II, which 
allows the formation of a huge complex containing 
RNAPII and all the proteins required for transcription 
to start (Figure 3.28). A protein called mediator coordi- 
nates all the activators with RNAP II and the transcrip- 
tion factors. 

Access to the DNA template is further complicated 
in eukaryotic cells because the chromosome DNA 
in the nucleus is in the form of chromatin, DNA and 
proteins that form nucleosomes every few hundred 
base pairs (Figure 3.29) (see Chapter 12). Chromatin 
remodeling machines precede RNA polymerase II on 
the DNA helix, removing the nucleosomes from the 
DNA template preceding transcription. Once all of 
the component parts are assembled, RNAP II consists 
of more than 100 proteins. As the transcript emerges 
from RNAP II, specialized proteins and RNA-protein 
particles bind to the transcript. These complexes serve 
a variety of purposes in the processing and transport of 
the eukaryotic RNA transcript. 


The initiation of eukaryotic transcription is more com- 
plex than that of prokaryotes. In addition to having pro- 
moter regions at the start of genes, eukaryotic genes have 
enhancer regions that are often located at a great distance 
from the genes they regulate. Many proteins must assem- 
ble into a large molecular machine before transcription 
can start. 
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FIGURE 3.26 Prokaryotic transcription. After the sigma factor dissociates from RNA polymerase, transcription proceeds. A region of DNA 
opened inside the polymerase is called the transcription bubble. The polymerase is shown working its way through the second of two adja- 


cent genes in the DNA; both will be encoded in one mRNA transcript. 
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FIGURE 3.27 Eukaryotic RNA polymerase II and general transcrip- 
tion factors. All eukaryotic polymerases require a set of general tran- 
scription factors to initiate transcription. (A) For RNAP II to initiate 
transcription, transcription factor TFIID binds to the TATA box, where 
it severely distorts the path of the DNA helix, probably helping to 
make the region stand out in a very large genome. (B) TFIIB binds 
to the -35 region of the promoter and helps to position RNAP II cor- 
rectly at the start site. Transcription factor TFIIH unwinds the double 
helix and phosphorylates (transfers phosphate groups) to the “tail” 
region of the RNAP II enzyme, which triggers the RNA polymerase 
to begin transcription of the gene. TFIIE and TFIIF help to attract the 
other transcription factors and stabilize the complex. 


Eukaryotic mRNA Transcripts Require 
Processing 
A nucleus gives eukaryotic cells an advantage by 


increasing the number of steps in the flow of genetic 
information where the cell can control the fate of gene 
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products. This is particularly true for RNA transcripts 
that require extensive posttranscriptional processing 
before the mature mRNAs can be transported into the 
cytoplasm and translated. 

Eukaryotic mRNA transcripts are synthesized as 
precursor mRNAs that carry the necessary genetic 
information to make a protein, but the precursor 
mRNAs must undergo three major processing events 
before export from the nucleus: 


e A methylguanosine cap structure is added to the 5’ 
end of the mRNA. 

e The precursor RNA is spliced to remove introns and 
join exons together. 

e A poly-A tail is added to the 3’ end of the precursor 
RNA. 


A modified guanine nucleotide cap is added to the 
5’ end of the RNA transcript as soon as the RNA tran- 
script emerges from the polymerase complex (Figure 
3.30). Following cleavage to remove one phosphate 
from the 5’ end of the RNA, the guanyl transferase 
enzyme adds a guanosine to the RNA transcript using 
a reverse, 5’ to 5’ bond (as opposed to the usual 5’ to 
3’ linkage), and the guanine cap is methylated. A cap- 
binding complex binds to the completed 5’ cap, which 
prevents the precursor mRNA from being nonspecifi- 
cally degraded in the nucleus. During protein synthe- 
sis, the 5’ cap binds to the small ribosome subunit. 
Although other types of eukaryotic RNAs are chemi- 
cally modified, the methyl G cap is added only to RNA 
molecules synthesized by RNAP II. 

Most eukaryotic genes are interrupted genes: the 
precursor MRNA transcript produced by RNAP II con- 
tains introns and exons that must be spliced to cre- 
ate the mature mRNA (Figure 3.31). The concept of 
interrupted genes was first demonstrated in 1977 by 
Richard Roberts at Cold Spring Harbor Laboratory 
and by Philip Sharp at Massachusetts Institute of 
Technology, who shared the 1993 Nobel Prize in phys- 
iology or medicine for their discovery of RNA splic- 
ing. Precursor mRNAs contain exons that carry the 
expressed genetic information in the gene. In the pre- 
cursor RNAs, the exons are separated by introns that 
vary in length from 100 to 100,000 bases in length. 
The introns are the intervening sequences in the RNA 
precursor, which must be removed by splicing before 
the RNA can leave the nucleus. After the introns in the 
precursor RNA molecules are removed, the exons are 
pasted together in a precise splicing process that pre- 
serves the reading frame from one exon to the next, 
rarely making a mistake. 

The multicomponent spliceosome is composed of 
small nuclear ribonucleic particles (snRNPs), which 
are, in turn, made up of small nuclear RNAs (snRNAs) 
and proteins. The snRNA components perform essential 
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FIGURE 3.28 RNA polymerase II transcription initiation requires enhancers. Regulatory proteins bind to enhancer regions of the DNA and 
make contact with RNAP through a mediator protein, causing the DNA to form loops involving large regions of genome DNA (1 to 20 kilo 
bp). The gene’s promoters and enhancers create a gene regulatory region that can be much larger than the gene itself, with many enhancers 
(both negative and positive) modifying recruitment of RNA polymerase, and allowing very fine tuning of gene expression. In addition, chro- 
matin remodeling proteins precede the complex to unwrap the DNA nucleosomes. 
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FIGURE 3.29 Nucleosomes package eukaryotic DNA. In eukaryotes, DNA is wrapped around histone proteins (blue and yellow), into struc- 
tures called nucleosomes that occur about every 200 base pairs. The nucleosomes are compacted together into higher orders of chromatin 
structures that are responsible for the dense packing of eukaryotic chromosomes during mitosis (cell division). The chromosomes together 
with these proteins are called chromatin. 


aspects of the splicing process, including the critically 
important selection of the exact sites for cutting and 
ligating the RNA and are involved in the catalytic activ- 
ity of the spliceosome. Splice sites are represented as 
DNA and RNA consensus sequences that base pair to 
the snRNAs as part of the process of splice-site selec- 
tion. However, much more is involved in choosing the 
splice site than just the splice-site consensus sequences. 
During transcription, components of the spliceosome 
are carried on the RNP II complex and transfer to the 
precursor MRNA to mark the location of the splice site 
and guides the choice of the next splice site as the RNA 
is transcribed. In a typical pre-mRNA splicing reac- 
tion, snRNPs bind to the consensus sequence sites in 


the precursor RNA and then come together to form a 
spliceosome complex (Figure 3.32). During the splicing 
reactions, the intron forms a lariat shape. Of course, in 
genes with many introns, this reaction must take place 
many times to remove the introns from the final mRNA. 
The lariat structures are removed and subsequently 
degraded. 


snRNPs assist in the essential processing of eukaryotic 
mRNAs. Several mechanisms work together to guarantee 
the extraordinary accuracy of splicing. 
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Alternative RNA Splicing Generates 
Complexity 


Alternative splicing is a critically important mechanism 
controlling the types of proteins synthesized in eukary- 
otic cells. The actual number of human genes thought 
to be about 20,000, yet the human body expresses 
more than 100,000 different proteins in its 60-100 
trillion cells. In order to “stretch” the 20,000 genes to 
more than 100,000 proteins, the expression of at least 
some genes must allow for the production of more 
than one protein product. Alternative splicing can 
allow the cell to select which exons in each transcript 
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FIGURE 3.30 5’ cap added to eukaryotic mRNAs. RNAP II adds 
a methylated guanosine cap to the 5’ end of the emerging mRNA 
transcript. Following this, a cap binding complex (CBC) protects the 
mRNA from degradation, helps in transport out of the nucleus, and 
assists in binding to the ribosome. 
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will be removed or retained by the splicing proc- 
ess (Figure 3.33). An exon might be retained in some 
mRNAs if the exon encodes a special protein function, 
or contains an alternative 3’ cleavage and poly A addi- 
tion site (discussed in the next section). In addition, the 
cell can select certain intron sequences to be retained 
in the mature MRNA. 

The variation in the protein products from a sin- 
gle gene often plays a role in the tissue-specificity 
of the protein produced; one version of the protein 
may be produced in skeletal muscle, for example, 
whereas a different form of the protein is required in 
heart muscle cells. Scientists have found many exam- 
ples of complicated but beneficial alternative splic- 
ing patterns at work in eukaryotic cells. The existence 
of alternative splicing has challenged the traditional 
concept of a gene and the interpretation of the one 
gene-one enzyme hypothesis. The alternative RNA 
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FIGURE 3.32 snRNPs form spliceosomes. (A) The spliceosome is 
formed by snRNP, particles consisting of proteins and small nuclear 
RNAs. The spliceosome forms on the pre-mRNA, and the reaction 
requires base pairing between the snRNA components and the pre- 
mRNA. The snRNAs catalyze RNA splicing and intron removal via 
the formation of a lariat structure. 
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FIGURE 3.31 Splicing of eukaryotic precursor mRNA. Splicing consists of the precise removal of intron sequences and the ligation of exons 


together to make a much shorter transcript containing an intact protein coding region. 
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splicing option adds greatly to the genetic flexibil- 
ity of the eukaryotic genomes by offering additional 
ways to fine-tune gene expression. However, coupled 
with the ambiguity of splicing consensus sites, it also 
contributes greatly to the difficulty of predicting pro- 
tein sequences from genomic data (see Chapter 7). 


Poly-A Tail Is Added When Transcription 
Terminates 


The act of adding a poly-A tail to the mRNA transcript 
is tightly coupled with the termination of eukaryotic 
transcription. Signals encoded in the DNA trigger an 
enzyme complex associated with RNAP II to cleave 
the mRNA, which effectively terminates transcrip- 
tion of that gene. The poly-A polymerase associated 
with the RNAP II complex then adds many adenine 
(A) nucleotides, one after another, to the 3’ end of the 
RNA. This poly-A tail is not encoded in the DNA; it is 
attached to the RNA after transcription of the gene is 
complete. The AAUAAA sequence, which is located 
about 20 bases upstream from the 3’ end of the tran- 
script, is required for cleavage of the RNA molecule 
and the addition of the poly-A tail (polyadenylation). 
The poly-A tail is essential for the function of RNAPII 
transcripts. The poly-A tail interacts with several poly-A 
binding proteins that function to protect the completed 
RNA transcript. 

On completion of the processing of eukaryotic pre- 
cursor MRNA into mature MRNA, the capped, spliced, 
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poly-adenylated mRNA (Figure 3.34) is exported 
through pores in the nuclear membrane into the cyto- 
plasm, where translation takes place. The transport of 
mRNA is regulated, so that only correctly and fully 
processed mRNAs are exported out of the nucleus. 


Eukaryotic mRNAs are capped, spliced, and polyade- 
nylated before leaving the nucleus. RNA splicing provides 
the opportunity for creating multiple protein products from 
a single gene. 


In the cytoplasm, the cap binding complex is 
replaced by a protein factor that interacts with the 
proteins bound to the poly-A tail on the same RNA 
transcript. This creates a circular RNA molecule that 
can be translated by many ribosomes at once, which 
greatly speeds the process of making new proteins 
(Figure 3.35). 


Prokaryotes Couple Transcription and 
Translation 


The RNAs made in prokaryotic cells typically have no 
introns and lack a 5’ cap and a poly-A tail; prokaryotic 
mRNAs are usually identical to the original RNA tran- 
script copied from the DNA. Because of the absence 
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FIGURE 3.33 Alternative RNA splicing increases genetic flexibility. Alternate splicing patters produce three different proteins from one 
gene. Different exons can be selected for inclusion or removal by the spliceosome. Proteins A, B, and C differ because of the different exons 
included in the alternatively spliced mRNAs. 
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FIGURE 3.34 Summary of mRNA processing in eukaryotes. The processes of mRNA maturation are shown in the order they occur during 
transcription: (1) Shortly after RNA polymerase initiates transcription, the emerging 5’ end of the transcript is capped, and CBP binds to the 
cap. (2) As the transcript continues to emerge, it collects proteins that mark exon/intron boundaries, and snRNPs bind to the pre-mRNA and 
form spliceosomes. (3) Spliceosomes read consensus sequences and other signals for splicing and remove the introns. (4) In response to a 
specific sequence in the DNA, the 3’ end of the pre-mRNA is cleaved and the poly-A tail is added. (5) The mature mRNA transcript travels to 


the cytoplasm through the nuclear pores. 
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FIGURE 3.35 Translation by ribosomes on circularized mRNA tran- 
scripts. In eukaryotes, an elongation factor (elF4) replaces the cap- 
binding complex and engages with the proteins that coat the poly-A 
tail, circularizing the transcript while it is translated. Many ribos- 
omes can translate a single transcript at the same time. 


of nuclei in bacterial cells, transcription and transla- 
tion can be coupled together (Figure 3.36) leading to 
one continuous, economical process where genes are 
copied into RNA and the RNA is translated into protein 
nearly simultaneously. As we have seen, the process in 
eukaryotes is quite different, involving processing of 
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FIGURE 3.36 Bacterial cells couple transcription and translation 
together. In prokaryotes, ribosomes start translation as mRNA transcripts 
emerge from RNA polymerase. Multiple ribosomes translate one MRNA 
simultaneously. Polycistronic mRNA transcripts enable groups of genes 
arranged in operons to be efficiently transcribed and translated. 
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Overview of translation on a ribosome. Ribosomes engage with mRNA near the transcript’s 5’ end, forming three binding sites 


for tRNAs when fully assembled. When a tRNA anticodon base pairs to the codon in the first site, the amino acid it carries is added to the 
growing amino acid chain at the heart of the ribosome. “Empty” tRNAs exit as the ribosome moves along the transcript. 


mRNA and the complete physical separation of tran- 
scription and translation. 


PROTEIN SYNTHESIS REQUIRES MRNA 
AND RIBOSOMES 


Translation is the process whereby the genetic code 
embedded in an mRNA molecule is decoded to cre- 
ate the chain of amino acids that constitutes a specific 
protein (Figure 3.37). The genetic code, read in groups 
of three nucleotides at a time, is universal in all life on 
earth with few exceptions (which are chiefly found in 
the DNA of mitochondria). The three-nucleotide codon 
specifies the same amino acid in an E. coli cell as it 
does in a human cell. This conservation of the genetic 
code is reflected in the very similar protein synthe- 
sis machinery and mechanisms in prokaryotes and 
eukaryotes, and it is used to great advantage in bioin- 
formatics databases that allow scientists to find similar 
genes and proteins in different organisms to explore 
their functions. 

The matching of codons to amino acids is not 
direct—that is, an amino acid does not bind to a 
codon in order to be added to the protein chain. All 
20 amino acids are ferried to the site of protein syn- 
thesis by tRNA molecules, which contain three-base 


anticodons that bind to the codons in the mRNA. The 
opposite end of the tRNA from the anticodon contains 
a site where an amino acid is attached to the tRNA by 
an aminoacyl tRNA synthetase enzyme. There are 20 
such enzymes in the cell, one for each amino acid. As 
the genetic code table shows (see Figure 3.17), there 
is more than one codon that specifies most of the 20 
amino acids. Because each tRNA molecule has a dif- 
ferent anticodon, there must be multiple tRNAs car- 
rying the same amino acid in the cell. Each enzyme 
must recognize and attach its amino acid to more than 
one tRNA, but not to an incorrect tRNA. To accom- 
plish this, the enzyme binds closely to its cognate 
tRNAs, and contacts several different sites in the tRNA 
in addition to the anticodon to ensure that it is a cor- 
rect (Figure 3.38). The result is a tRNA bonded to an 
amino acid, which is called a “charged” tRNA. 


Translation in prokaryotes and eukaryotes is similar. In the 
process of translation, tRNAs carry amino acids to a bind- 
ing site in the ribosome where the anticodon in tRNA is 
matched by base pairing to a specific mRNA codon. 


In both prokaryotes and eukaryotes, ribosomes 
are the cellular machines of protein synthesis. They 
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FIGURE 3.38 Aminoacyl tRNA synthetases charge tRNAs. Every 
tRNA engages with an enzyme that attaches the correct amino acid 
to the tRNA. Cells have 20 aminoacyl tRNA synthetases, one for 
each amino acid. Because there are more than 20 tRNAs in every 
cell, most of these enzymes must recognize multiple tRNAs. It is 
clear from examining the complex formed by the glutamine syn- 
thetase and its partner, a glutamate tRNA, that many points of con- 
tact are examined by the enzyme in order to ensure that the tRNA it 
operates on is one of the correct ones for that amino acid. 
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consist of two subunits that come together to form a 
ribosome only in the presence of mRNA, a special ini- 
tiator tRNA, and initiation factor proteins. Ribosomes 
differ in size and exact composition in eukaryotes and 
prokaryotes but are very similar to each other (Figure 
3.39). Fully assembled ribosomes contain one bind- 
ing site for MRNA and three binding sites for tRNAs 
(Figure 3.40). The tRNA binding sites are named for 
their functions in protein synthesis. The aminoacyl (A) 
site is where tRNAs bearing an amino acid enter the 
ribosome (except the first tRNA; discussed later). In the 
peptidyl (P) site, a tRNA carries the peptide being syn- 
thesized. Finally, the ‘empty’ tRNA leaves the ribosome 
via the exit (E) site. 

Protein synthesis begins with the assembly of the 
ribosome and a tRNA onto the mRNA transcript, with 
the help of protein initiation factors (Figure 3.41A). 
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FIGURE 3.39 Prokaryotic and eukaryotic ribosomes. Fully assembled prokaryotic and eukaryotic ribosomes are extremely large complexes 
with molecular weights in the millions of daltons. In both cases, dozens of proteins come together with rRNAs to form two subunits. The 
ribosomal proteins, which make up about one-third of the ribosome structure, mainly seem to support and stabilize an rRNA core, which has 
the peptidyl transferase enzyme activity, that forges the peptide bond between amino acids to make the protein. 
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FIGURE 3.40 Binding sites in ribosomes. Three sites for tRNAs are formed when the ribosomal subunits come together, along with a chan- 
nel for the mRNA to travel through. These are shown (A) schematically and (B) as a molecular model, determined by x-ray crystallography, 
of the prokaryotic ribosome with bound tRNAs. The tRNAs are bound in the A, P, and E sites. The anticodons of the A and P site tRNAs are 
in contact with their respective codons on the mRNA transcript, shown as a string of gold balls that mark the path of the mRNA transcript 


through the ribosome. The growing polypeptide is not shown. 


The initiation factors in eukaryotes and prokaryotes are 
different, but they accomplish the same goal: uniting 
the MRNA, ribosomal subunits, and the initiator tRNA. 
The codon signifying the start of translation is AUG, 
which codes for the amino acid methionine (met). 
Therefore, all proteins initially start with methionine 
as the first amino acid, but the methionine is often 
removed later. 

The first step in initiation is the binding of the ini- 
tiator tRNA, which always carries the amino acid 
methionine (in prokaryotes it is chemically modified to 
formylmethionine), to the P site of the small ribosomal 
subunit with the help of two initiation factors. Next, 
the mRNA binds to the complex with the assistance of 
additional initiation factors. In eukaryotes, two initiation 
factors bind to the 5’ methylguanosine cap of mRNA, 
which enables it to bind to the small ribosomal subunit. 
The small subunit then moves along the mRNA look- 
ing for the AUG start codon. Additional ribosomes can 
engage the transcript after the first ribosome moves on. 
Prokaryotic mRNAs contain a Shine-Dalgarno sequence 
just upstream from the start codon. With the help of 
prokaryotic initiation factors, this RNA sequence base 
pairs to a sequence in the rRNA present in the small sub- 
unit of the ribosome. This arrangement places the AUG 
start codon in the correct position to permit the large 
ribosomal subunit to join the complex. This results in the 
formation of a complete ribosome containing the initia- 
tor tRNA base paired through its anticodon to the mRNA 
AUG start codon. Energy is provided for translation 


initiation by the hydrolysis of high-energy chemical 
bonds in two molecules of GTP. 

With the initiator tRNA in the P site, the ribos- 
ome engaged at the start codon is ready to receive a 
charged tRNA in the A site (Figure 3.41B). Each suc- 
cessive tRNA is accompanied to the site by an elon- 
gation factor: EF-Tu (prokaryotes) or EF-1 (eukaryotes). 
These protein factors help to ensure that the codon- 
anticodon base pairing is correct. If it is, GTP is hydro- 
lyzed and the tRNA is shifted slightly into an ideal 
position for peptide bond formation to occur. The 
peptidyl transferase activity of the ribosome, supplied 
entirely by RNA, forges the peptide bond between the 
two amino acids, transferring the resulting dipeptide to 
the tRNA in the A site. At this point, another elongation 
factor bound to GTP binds to the ribosome, and GTP 
is hydrolyzed to provide energy for the ribosome to 
move three nucleotides along the mRNA. This moves 
the A and P site tRNAs into the P and E sites, respec- 
tively, and the tRNA in the E site dissociates from the 
ribosome. The tRNA in the P site is holding the dipep- 
tide and is ready for the cycle to repeat, resulting in 
sequential additions of amino acids to the growing 
polypeptide chain. As the protein chain emerges from 
the ribosome, it begins the process of folding. Some 
proteins can fold fully as they exit the ribosome; others 
require proteins called chaperones, which help newly 
synthesized proteins to fold into proper 3D structures. 
Elongation by the ribosome continues until a stop 
codon is encountered in the mRNA sequence. 
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FIGURE 3.41 Eukaryotic translation initiation, elongation, and termination on ribosome. (A) Translation initiation requires several initiation 
factors, only a few of which are shown here. The factors assist in the assembly of the initiator tRNA with the small subunit, followed by the 
binding of mRNA, which is recognized both by its cap and poly-A tail. This complex scans for the first AUG in the transcript (prokaryotes use 
a different mechanism to find the first AUG). When the AUG codon is engaged, the large subunit joins to form the complete ribosome com- 
plex. This process costs the cell two high-energy molecules in the form of GTP. (B) Elongation of the protein chain involves a repeated cycle of 
events as the ribosome moves along the MRNA. The tRNAs landing in the A site provide the new amino acids to add; the existing polypeptide 
is added on to the newly arrived amino acid, and then the ribosome translocates one codon further, moving the tRNA holding the polypep- 
tide to the P site. Additional protein elongation factors assist in the process, and GTP hydrolysis again provides energy. (C) Stop codons termi- 
nate protein synthesis. A protein release factor that mimics the shape of a tRNA binds to the stop codon in the A site, and the protein chain is 


(A) 


released. The ribosomal subunits dissociate and can re-form the ribosome on the beginning of another, or even the same, transcript. 


Three codons in the genetic code function as “stop” 
signals for translation. Proteins that mimic the shape 
and properties of a tRNA molecule interact with the 
stop codons in the A site (Figure 3.40). These molecu- 
lar mimics, called release factors, do not carry amino 
acids, but they do trigger the activity of the peptidy! 
transferase. The enzyme adds a water molecule to the 
polypeptide in place of an amino acid so that instead 
of being transferred to a waiting amino acid, the new 
protein is released and leaves the ribosome. The ribos- 
omal subunits separate from each other and from the 
mRNA, ready to bind to the initiator components and 
begin translation again. 


Translation takes place in three phases: initiation, repeated 
cycles of elongation, and termination. In addition to 
mRNAs and ribosomes, each phase of translation requires 
protein factors and energy derived from GTP hydrolysis. 


To become functional, the newly made protein must 
finish folding correctly. Proteins that fold incorrectly 


are usually degraded; if this process is faulty, seri- 
ous disease can result. In some cases, proteins join 
together as subunits to form complexes or molecular 
machines. Many proteins must bind to small helper 
molecules to function; an example is hemoglobin, 
which must contain heme in order to bind and carry 
oxygen. In eukaryotes, many proteins also undergo 
posttranslational modifications in order to achieve full 
functionality. There are numerous types of modifica- 
tions to proteins: formation of disulfide bonds, covalent 
attachment of carbohydrates or lipids, the addition of 
small organic groups such as phosphates, methyl and 
acetyl groups, and the removal of part of the polypep- 
tide chain are just a few of the possibilities. 


Gene Expression Responds to Changes In 
Environment and Development 


Precise control of gene expression is an essential fea- 
ture of the life of an organism. Genes are turned on or 
off in response to signals from the outside environment 
and from inside the cell as conditions change. The 
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FIGURE 3.41 Continued 


mechanisms used by cells to control or regulate gene 
expression differ greatly in prokaryotic and eukaryotic 
cells, in part because prokaryotic cells largely control 
gene expression at the level of transcription initiation, 
whereas eukaryotic cells not only regulate transcrip- 
tion initiation but also impose controls at many other 
points along the gene expression pathway. 


Prokaryotic Gene Control: Under the 
Influence of an Operon 


Consider how dramatically and rapidly the growth 
conditions can change for a bacterium living in the 


In 
5’-MeG— -AAA,, o3 
AUG UCU 


Termination 


human gut. The bacterium must adapt constantly and 
the ability to rapidly change its gene activity helps it 
to adapt and survive. Many bacterial genes are organ- 
ized into operons, small groups of related structural 
genes that are transcribed as a unit under the control 
of a promoter and an additional regulatory sequence 
called an operator (Figure 3.42A). Bacterial operons 
typically encode enzymes that participate in a particu- 
lar biochemical pathway. Seventy-five different oper- 
ons controlling 250 genes have been identified in the 
E. coli genome. 

At any given time, most prokaryotic genes are inhib- 
ited from transcription by repressor proteins that bind 
to operator sequences. The operator is near or overlaps 
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FIGURE 3.42 Trp operon control. (A) The trp operon consists of the 5 genes for the biosynthesis of tryptophan, plus the control region. (B) 
When tryptophan levels are low, the repressor protein is inactive and unable to bind to the operator region of DNA; RNA polymerase binds 
to the promoter and transcription proceeds. (C) As tryptophan levels rise, tryptophan can bind to a site on the repressor, changing the repres- 
sor conformation so that it binds to the operator DNA. This prevents RNA polymerase from binding to the promoter, shutting off transcription. 
(D) Two stem-loop structures can form in the mRNA transcript of the trp operon. Which one forms is dependent on the speed of the ribosome 
on the mRNA—a faster ribosome (upper panel) covers a specific part of the transcript, leading to a structure that causes premature transcrip- 
tion termination. A slow ribosome (lower panel) covers a different area, and the mRNA forms the alternative mRNA structure that allows 
transcription of the trp operon to proceed. The speed of the ribosome is, in turn, dependent on the availability of tryptophan, because it is 


translating a peptide that contains tryptophan. 


with the promoter, so a repressor protein bound to 
an operator physically blocks the RNA polymerase’s 
access to the promoter, preventing transcription from 
occurring. We will look at two well-studied bacterial 
operons that demonstrate how repressors and other 
regulatory proteins respond to signals to regulate gene 
expression. 


Proteins mediate repression and activation of bacterial 
genes in response to signals from inside and outside the 
cell. Prokaryotic genes are often organized into operons 
that can be controlled as a unit. 


The genes of the pathway for biosynthesis of the amino 
acid tryptophan are in the bacterial trp operon. When 
tryptophan is plentiful in the environment, the trp 
operon is silent (not transcribed) because of a repressor 
protein that binds to the trp operator DNA, preventing 
RNA polymerase from binding to the promoter (Figure 
3.42). However, when tryptophan is not readily avail- 
able, the trp operon is actively transcribed, and the 
enzymes needed for the biosynthesis of tryptophan are 
expressed (Figure 3.42C). 

How does the bacterial cell know when to make 
tryptophan? The answer is that tryptophan itself binds 


to the repressor protein to enable it to bind the opera- 
tor site and block transcription. Without tryptophan, 
the repressor is unable to bind to the operator, and 
transcription proceeds. This negative feedback loop 
allows the cell to respond to an abundance of tryp- 
tophan by ceasing to make the amino acid. 

The trp operon is also an excellent example of a 
means of further adjusting the rate of gene expression, 
called attenuation (Figure 3.42D). The mechanism 
of attenuation depends on the coupling of transcrip- 
tion and translation in prokaryotes: while prokaryo- 
tic mRNA is being transcribed, ribosomes bind and 
work their way along the transcript, synthesizing the 
encoded protein(s). The trp operon contains a “leader” 
sequence upstream of the genes in the operon. Part of 
the leader RNA, which encodes a short polypeptide, 
can base pair with itself to form two different stem- 
loop RNA structures. Depending on how quickly the 
ribosome can translate the leader polypeptide, one of 
two different stem-loop structures forms in the mRNA 
located between the ribosome and RNA polymerase. 

One of these structures allows the RNA polymer- 
ase, which is ahead of the ribosome, to continue tran- 
scription; the other stem-loop structure causes the 
RNA polymerase to release the DNA and transcription 
stops. The key is the position of the ribosome along the 
mRNA, and whether or not it physically covers parts of 
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lac operon structure and function. (A) (upper) The structure of the lactose operon DNA and the regulatory gene that precedes 


it. The lac operon contains a promoter (P), an operator (O), and three genes (Z, Y, A) required for metabolism of lactose. Upstream from the 
operon, the lac repressor gene (I) is encoded. Also upstream is a region that binds the catabolite activator protein (CAP). (lower) The biochem- 
ical reaction to breakdown lactose into glucose and galactose, which is catalyzed by the enzyme beta galactosidase (encoded by gene Z). (B) 
(upper) In the absence of allolactose, the repressor binds and blocks transcription of the lac operon. (lower) When lactose is available, the 
allolactose inducer molecule binds to the repressor protein, which inactivates the repressor and removes it from the operator, allowing a low 
level of transcription of the genes for metabolism of lactose. (C) A summary of the lac operon under three different environmental conditions. 


the mRNA required for each of these RNA structures 
to form. If the ribosome can translate the leader pep- 
tide quickly, it covers a section further along the tran- 
script, and the mRNA forms a structure that triggers 
RNA polymerase to fall off the operon DNA before it 
transcribes the trp biosynthesis genes. If the ribosome 
is too slow, the alternative stem-loop structure forms, 
which allows RNA polymerase to proceed along 
the DNA and transcribe the operon. So the question 
becomes, what controls the speed of the ribosome in 
translating the leader sequence? 

If you think it should have something to do with how 
much tryptophan is present, you are correct. The leader 
peptide encodes two tryptophan amino acids. Without 
a ready supply of tryptophan, the ribosome stalls as it 
waits for the amino acid, and it covers a specific part 
of the mRNA transcript. As a result, the mRNA forms 
the stem-loop structure that allows RNA polymerase to 
proceed with transcription. In contrast, when enough 
tryptophan is present to make the leader peptide, the 
ribosome moves quickly through the region of mRNA, 
uncovering the mRNA that forms the structure that 
causes RNA polymerase to fall off the mRNA. 


The trp operon exemplifies the typical default repression of 
many prokaryotic operons and the activation triggered in 
response to a cellular need. In addition, attenuation con- 
trol of the operon allows for more delicate regulation. 


The lac operon of E. coli has been studied since the 
1950s and exemplifies how genes can be regulated 
in response to more than one environmental circum- 
stance. E. coli’s preferred source of energy is glucose, 
but when necessary, the bacterium can also metabo- 
lize other sugars, such as lactose, the sugar found in 
milk. The /ac operon contains the genes for breaking 
lactose down into its constituent chemical parts (glu- 
cose and galactose) (Figure 3.43A). We'll consider 
three conditions that the lac operon should respond 
to appropriately: (1) lactose is available but there is lit- 
tle or no glucose available; (2) only glucose is avail- 
able as a food source; (3) glucose and lactose are both 
available. Under condition 1, the lac genes should 
be highly expressed, and under condition 2, the lac 
operon should be silent (turned off). Under condition 
3, asmall amount of lac operon expression is useful. 

The presence or absence of lactose plays an 
important, but incomplete, role in lac operon expres- 
sion (Figure 3.43B). When lactose is absent, the lac- 
tose repressor protein (product of the /ac/ repressor 
gene, not part of the lac operon) binds to the opera- 
tor region and covers part of the promoter, preventing 
RNA polymerase from binding to the promoter site and 
thereby blocking transcription. When lactose becomes 
available, some gets converted to allolactose (an iso- 
mer of lactose). The lac repressor has a binding site 
for allolactose, and in response to allolactose bind- 
ing, the repressor’s DNA binding site changes shape 
and no longer binds to DNA, leaving the promoter 
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FIGURE 3.43 Continued 


site on the DNA exposed and available for transcrip- 
tion. Thus, allolactose serves as a signal molecule that 
alerts the cell to the presence of lactose, and it acts 
as an inducer of lac operon expression by removing 
repression. 

However, because of the weak lac operon pro- 
moter, inducing expression via allolactose only per- 
mits a very low rate of expression. If glucose is present 


as well as lactose (condition 3), this is satisfactory, but 
if no glucose is present (condition 1), lactose metab- 
olism must be ramped up sharply to meet the cell’s 
energy needs. Therefore, the cell must also sense the 
level of availability of glucose. This is accomplished 
via an additional regulatory protein, the catabolite 
activator protein (CAP, Figure 3.43C). CAP is activated 
in response to a lack of glucose and binds to a control 
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Box 3.2 Which Gene Makes Us Human? 


We have discussed many of the basic biochemical processes 
that account for life, including gene storage and expression, 
the synthesis of RNAs, and the production of proteins. We 
have also begun to explore how cells control gene expression 
and decide which genes are transcribed into RNA, when and 
why. Knowing the DNA sequence of a gene or a genome is 
not enough to reveal many secrets about the living organism. 
Perhaps this is best explained by the controversy surrounding 
the sequence differences between humans and chimpanzees. 
At the time of the human genome project completion, the 
human and chimp genomes were thought to have over 98% 
the same sequence, inspiring questions such as “Which gene 
makes us human?” “We've known for a while that the pro- 
tein coding genes of humans and chimpanzees are about 99 
percent the same,” said Yale scientist Michael Snyder, “The 
challenge for biologists is accounting for what causes the 
substantial difference between the person and the chimp.” 
(Science 317: 815-819, 2007). 


sequence of DNA upstream of the promoter. CAP 
assists RNA polymerase in binding to the promoter, 
greatly increasing the rate of transcription and provid- 
ing plenty of lac operon gene products for deriving 
energy from the breakdown of lactose. When lactose is 
no longer plentiful, allolactose levels will drop and the 
lac repressor will return to the operator DNA, silencing 
transcription once again. 

The lac operon is an illustration of how two 
sources of information from a cell’s environment can 
be integrated to elicit the appropriate response. Of 
course, even bacterial cells are subject to more than 
two signals at a time. In eukaryotes, the integration 
of signals increases in complexity with the size of the 
genome and the increasing complexity of multicellular 
organisms. 


EUKARYOTIC GENE REGULATION 


Gene regulation in bacteria is relatively simple com- 
pared with the much more complex systems that con- 
trol gene expression in eukaryotic cells. Part of the 
issue is the number of genes; a human cell has more 
than 20,000 genes compared to only about 2500 
for a single-celled bacterium. The human body has 
more than 200 cell types and, consequently, executes 
vastly different patterns of gene expression in differ- 
ent tissues. For example, in order to function cor- 
rectly, human liver cells and human brain cells must 
express very different genes as well as a common set 
of “housekeeping” genes, those required for any cell 


A more recent study indicates that the suggested 98% sim- 
ilarity in DNA between chimp and humans should be revised 
to about 95%. Either way, humans and chimps apparently dif- 
fer by only 2% to 5% of the DNA bases. Conventional wis- 
dom says that if the difference between humans and chimps 
is not due to the gene sequence, then it must be due to dif- 
ferences in the control of gene expression that influence 
the production of similar proteins in humans and chimps. 
Experimental support for this idea was lacking until recent 
studies reported that closely related species of yeast use very 
different patterns of gene regulation, even in cases where in 
the end, the outcome of the gene regulation was the same. 
Scientists have found that gene regulation is the key to the 
large differences observed between species. So we know that 
it is not just the genes that count, it’s how (and when) you turn 
them on that matters. 


to live. Moreover, a eukaryotic organism undergoes 
many stages of development (such as embryonic, fetal, 
adolescent, and adult development) and requires vary- 
ing regulation of gene expression in cells at different 
stages of life. The complexity of eukaryotes is reflected 
in the many opportunities for regulation of the gene 
expression pathway (Figure 3.44). 

Gene expression regulation starts with transcription 
initiation, and we will take a broad look at how this 
step is regulated. A detailed study of the many specific 
processes that regulate this and other steps of eukaryo- 
tic gene expression is beyond the scope of this book; 
interested students can look further by exploring books 
and web sites listed at the end of the chapter. 

In eukaryotes, DNA is wrapped in histone proteins, 
tightly bundled into nucleosomes and higher-order 
structures that prevent transcription unless they are dis- 
assembled for RNA polymerase initiation complexes 
to gain access. Chromatin remodeling machines are 
a necessary part of the RNA polymerase machinery. 
Because of this default “off” state of genes, transcrip- 
tional activators are more common than repressors. 
Rather than acting singly, eukaryotic transcription 
factors combine to form complexes that have varying 
effects depending on their exact makeup (Figure 3.45). 
These smaller complexes contribute to a much larger 
picture of combinatorial control integrated by the 
mediator protein that connects the transcription fac- 
tors scattered over long distances of DNA to the RNA 
polymerase initiation complex. The initiation of tran- 
scription occurs only if the balance of all these inputs 
falls on the side of activation. 
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FIGURE 3.44 Opportunities for regulation of eukaryotic gene expression. The pathway of gene expression in eukaryotes has many stages, 
thanks to the compartmentalization of the cell and the complexity of mRNA processing. The steps are outlined, and the stages where expres- 
sion can be regulated are noted. Eukaryotic gene expression can be controlled at all steps; only some of these steps would apply to prokaryo- 
tic gene expression because of the lack of a nucleus and the absence of mRNA processing. 


Among many additional controls of gene expres- 
sion is the regulation via small ncRNAs. For example, 
micro RNAs (miRNAs) transcribed by RNA polymer- 
ase Il contain sequences complementary to parts of 
mRNAs. The binding of miRNA to mRNA triggers deg- 
radation of the mRNA by cellular machinery, providing 
a way to turn off or dampen gene expression during 
or after transcription. Small interfering RNAs (siRNAs) 
also exert control over eukaryotic gene expression (see 
Chapter 11). 

Regulation occurs at many points along the gene 
expression pathway, including the posttranscriptional 
processing and transport of mRNA molecules. These 
multiple and complex levels of gene regulation often 
challenge scientists as they experimentally reproduce 
and test different aspects of the gene expression mecha- 
nisms in the cell. Regardless of complexity, the regula- 
tory mechanisms must be considered, understood, and 
resolved if DNA technology is to be successfully applied 
to treating complicated genetic diseases (see Chapter 
10) and developing new therapeutic approaches (see 
Chapter 11), including developing designer drugs (see 
Chapter 13). The regulatory aspects of gene expression 
are as important to gene activity as the fundamental 
processes of transcription and translation. 


SUMMARY 


The realization that chromosomes are involved in 
heredity led scientists to investigate the role of DNA in 
living systems. Garrod’s studies indicated that enzymes 
and proteins are related to gene activity. Beadle and 
Tatum showed that a gene regulates the production of 
an enzyme, a concept that was extended by Ingram’s 


experiments. He concluded that genes regulate protein 
production and that a mutation that affects the iden- 
tity of even a single amino acid in a protein can have a 
devastating effect. RNA was found to act as an interme- 
diary between the information in DNA and the amino 
acid sequence of a protein chain. Crick’s group was 
instrumental in identifying a three-base sequence in 
DNA as the basic unit of the genetic code, the codon. 

Much of gene expression is the pathway from 
DNA through the intermediary RNA to the synthesis 
of proteins; however, DNA also encodes RNA that is 
a final gene product. Transcription is the act of copy- 
ing RNA from the genetic messages in DNA templates. 
Translation is the transfer of the specific genetic mes- 
sage in an mRNA to a corresponding amino acid 
sequence in the protein. DNA encodes three forms of 
RNA involved in translation: messenger RNA (mRNA), 
whose codons transport the genetic message; ribos- 
omal RNA (rRNA), which is used to construct ribos- 
omes; and transfer RNA (tRNA), which transports 
amino acids to the ribosomes and whose anticodons 
complement the codons in mRNA. Translation in both 
eukaryotes and prokaryotes occurs on ribosomes, 
requiring charged tRNAs, mRNA, and a variety of ini- 
tiation, elongation, and termination factors that ensure 
accurate protein synthesis. 

Genes in bacteria are frequently organized into 
operons, which often consist of a single transcription 
unit including several genes with related functions 
that are transcribed as one polycistronic mRNA mol- 
ecule. Because of the absence of a nucleus, a bacterial 
mRNA can be efficiently transcribed and translated at 
the same time. In contrast, the typical eukaryotic tran- 
scription unit encodes only a single gene, but inher- 
ent in the gene itself are options that can change the 
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FIGURE 3.45 Combinatorial control of eukaryotic transcription initiation. (A) Individual transcription factor proteins. (B) Transcription fac- 
tors in eukaryotes act in combination and can have different effects depending on their partners in a complex. Differing combinations of reg- 
ulatory sequences form the control regions in DNA that attract various combinations of partners. (C) The combinatorial effect is compounded 
by the integration of many signals by the mediator protein. The various inputs are themselves combinatorial, vastly increasing the possibilities 


for fine-tuned, cell-type specific regulation. 


nature of the protein product. For example, many 
genes have alternative RNA splicing patterns that allow 
the synthesis of more than one protein from a single 
gene. Additionally, transcription and translation occur 
in separate compartments of the cell. 

Gene expression is controlled by regulatory pro- 
teins in response to signals from the cell and from the 
environment. Differential gene expression is required 
to preserve cell energy and chemical resources, to 
respond to environmental changes, and to provide 
for the growth and development of an organism. In 
prokaryotic cells, negative control of gene expression 
occurs when a regulatory protein binds to a regula- 
tory site in the DNA called the operator and inhibits 
transcription. Positive control occurs when an activa- 
tor protein stimulates transcription by encouraging the 
binding of RNA polymerase to the promoter DNA site. 
Eukaryotic gene expression is complex because of the 
separation of the nuclear and cytoplasmic compart- 
ments and the demands of multicellular organisms. 
In addition to having promoter regions at the start of 
genes, eukaryotic genes transcribed by RNAP II have 


control regions that can lie at a great distance from 
the genes they regulate. Many signals are integrated 
to determine whether the transcription of a eukaryotic 
gene should begin. 


REVIEW 


This chapter has focused on the mechanisms by which 
the genetic message in DNA is expressed in a pro- 
tein. To assess your understanding of the discussions, 
answer the following questions: 


1. Briefly summarize the insight of Garrod and the 
work of Beadle and Tatum, and show why both 
were important to the evolving interest in DNA. 

2. What did Ingram’s experiments indicate, and why 
were his results significant? 

3. What evidence pointed scientists to the role of 
RNA in protein synthesis, and how did Crick’s 
experiments verify the three-base codon in the 
DNA molecule? 
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4. Name three types of RNA that participate in 
protein synthesis, and specify the function of 
each type. 

5. Describe the synthesis of RNA taking place in 
transcription, and specify how mRNA is modified 
before it is exported from the nucleus of a eukary- 
otic cell. 

6. Explain the structure and function of transfer RNA 
molecules. 

7. Summarize the activities of translation taking 
place at the ribosome once the mRNA and tRNA 
molecules have arrived. 

8. Compare negative and positive controls that can 
influence the expression of a gene. Use the proc- 
esses of repression and activation as guides for 
your Comparison. 

9. Explain how the lac operon integrates and 
responds to signals from the environment report- 
ing the availability of lactose and glucose. Finish 
by drawing the operon and CAP site under 
conditions where neither glucose nor lactose is 
available. 

10. Describe in broad terms why gene controls are 
necessary in a cell and why regulation of gene 
expression is more complex in human cells than 
in bacterial cells. 
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Designer Enzyme Cuts HIV Out of Infected Cells 


Bacterial Enzyme Turns the Tables on Deadly Retrovirus 
Scientific American, June 28, 2007 

By J. R. Minkel 

Scientists have constructed a custom enzyme that 
reverses the process by which the human immunodefi- 
ciency virus (HIV) inserts its genetic material into host 
DNA, suggesting that treatment with similar enzymes 
could potentially rid infected cells of the virus. In tests on 
cultured human tissue, the mutated enzyme, Tre recombi- 
nase, snipped HIV DNA out of chromosomes. 


Scientists designed an enzyme that can remove the 
human immunodeficiency virus (HIV) DNA from the 
host genome, as a test to see if it might be possible 
to engineer an enzyme to attack the HIV genome in 
humans and potentially rid the infected cells of the HIV 
virus and the infection. The function of the engineered 
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Tre recombinase enzyme was first tested in cultured 
human tissue cells in the lab. The enzyme cut the cell’s 
genome DNA at both ends of the integrated HIV virus 
DNA, removing it from the chromosome, and rid the 
cells of the virus. This exciting advance in the lab was 
tempered by the caveat that potential AIDS treatments 
can often take a decade to progress to clinical trials. The 
media hype about a possible cure for AIDs obscures 
the underlying achievement of this novel experiment, 
the ability to extend the reach of a traditional biotech- 
nology tool, the restriction enzyme, directly into the 
patient. 

In this chapter the reader will find out about the 
structure and function of restriction enzymes, truly tal- 
ented proteins that made research in molecular genetics 
and biotechnology possible for nearly 40 years. Each 
restriction enzyme acts like a “molecular scissors” that 
cuts across the DNA helix. Each restriction enzyme cuts 
DNA at only one specific DNA sequence, the restric- 
tion enzyme recognition site. Each time the restriction 
enzyme finds its recognition sequence in the DNA, it 
cuts across both strands of the DNA helix at that site 
only, and not at other sequences in the DNA. For dec- 
ades, restriction enzymes have been used routinely to 
produce DNA fragments carrying specific genes and 
using recombinant DNA technology moved the genes 
into them into vectors for expression in host cells (see 
Chapter 5). The large number of biotechnology com- 
panies marketing more than 3000 different restriction 
enzymes reflects the importance of restriction enzymes 
to the progress of biomedical research. 

Recently scientists have created designer restriction 
enzymes that cut DNA at desired base pair sequences 
that are “programmed” into the structure of the cus- 
tom-built enzymes. Customized restriction enzymes 
are a good example of the many innovative tools and 
technologies in use and currently under development 
to study DNA and gene products in cells. 


LOOKING AHEAD 


This chapter focuses on the scientific discoveries 
and developments that create the foundations for the 
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practical applications of DNA technology that 
made the DNA revolution possible. When you have 
completed this chapter, you should be able to do the 
following: 


e Discuss the biochemical tools used in DNA tech- 
nology, and recognize why microorganisms occupy 
an important place in this area of DNA science. 

e Explain the discoveries and experiments that formed 
the basis for the key features of modern DNA 
technology. 

e Identify some of the individuals whose work laid 
the foundations for DNA technology. 

e Explain the basic processes used in recombinant 
DNA technologies including bacterial transforma- 
tion and cutting and ligation of DNA. 

e Discuss the ethical and safety concerns that have 
been raised about DNA technology over the years, 
and describe the role of the Asilomar conference. 

e Recognize some important terms and concepts used 
in DNA technology, and broaden your vocabulary 
to include words common to this area of science. 


INTRODUCTION 


During the 1950s and 1960s, scientists made substan- 
tial gains in molecular biology as they clarified the role 
of DNA in the biochemistry of protein synthesis and 
explained the intricate details of this process. In the 
1970s, scientists began to manipulate DNA and devised 
methods to cut and join DNA fragments to produce 
recombinant DNA molecules (as we will see in this 
chapter). These recombinant DNA experiments. intro- 
duced the era of DNA technology. 

The science of DNA technology was a new frontier 
to explore. Researchers discovered, for example, that 
they could transplant animal genes into bacterial cells 
and coax the genes to function in this new environ- 
ment; they kindled hopes that plants, noted for their 
ability to produce carbohydrates, could be engineered 
to produce proteins; and they laid the foundations for 
treatments for genetic diseases (see Chapter 15). In addi- 
tion, the future held promise for inexpensive sources of 
bioenergy, the development of novel vaccines, and the 
mass production of pharmaceuticals (see Chapter 13). 
Some saw no limit to what could be accomplished by 
DNA science. 

Today, in the twenty-first century, much of the amaz- 
ing progress predicted by scientists has come to pass or 
is within sight. The public benefits from a large selec- 
tion of proteins produced in genetically engineered 
organisms including bacteria, yeast, and mammalian 
cells, which can produce insulin, human hemoglobin, 
human growth hormone, detergent enzymes, and many 
other proteins from different species. Many advances 
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are on the horizon as well. Therapeutic antibodies are 
now available to combat cancer and immunological 
diseases, and we have an effective vaccine against the 
human papilloma virus that causes cervical cancer is 
now available (Guardasil from Merck & Co.). DNA 
technology and the general field of molecular biology 
continue to have staggering potential for improving 
pharmaceutical yields, diagnosing and treating human 
disease, and creating new genetic testing and diagnosis 
products for use at home (see Chapters 10, 11, 13). 

At the same time, DNA technology has triggered a 
degree of public concern and confusion. It has created 
dilemmas for governmental and regulatory agencies, 
and in general it has raised fears that genetically modi- 
fied organisms, especially in agriculture, could pose a 
threat to human welfare. We will discuss many of these 
issues in the chapters ahead as we consider the founda- 
tions for DNA technology and the practical applications 
of the awesome abilities it gives us. For the present, we 
will explore the thought processes and discoveries that 
laid the foundations for modern DNA technology. The 
fruits of this technology are the subject matter for the 
remainder of this book. 


As genetic engineering research blossomed in the mid-to- 
late twentieth century, it raised both hopes and concerns 
about biotechnology. 


TOOLS OF GENETIC ENGINEERING 


Humans have guided the flow of genetic information 
for thousands of years without realizing it, for example, 
breeding useful agricultural plants from wild ancestral 
stock seeds (Figure 4.1). New organisms can be created 
by breeding organisms that ordinarily do not mate. 
A nectarine, for example, is the offspring of a cross 
between a plum and a peach, and a mule is derived 
from breeding a horse with a donkey. The approaches 
used in modern DNA technology go well beyond the 
genetic crosses of the past. Scientists can now intervene 
directly in determining the genetic fate of organisms, 
and many of these can be performed outside the cell 
in test tubes. By taking DNA fragments from different 
species and connecting the DNA fragments together in 
new combinations, researchers can now make entirely 
new chromosomes that profoundly change the charac- 
ter of the organisms. An early example is the human 
insulin protein expressed in bacteria carrying the 
human insulin gene DNA on a plasmid. Arguably, this 
is a new species of bacterium. 

During the 1960s and 1970s, the new DNA tech- 
nology required new approaches to science, new 
insights, and new materials. Methods to cut DNA and 
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Kohlrabi Cabbage 


Guiding the inheritance of genes. Through the centuries, humans used a wild species of a cabbage-like plant to breed all 


the modern plants shown here. Depending on the climate and the desires of the local population, some varieties were bred to have a hard 
head as in a modern cabbage, some formed masses of flower buds as in broccoli and cauliflower, and some made clusters of leaf buds as in 


Brussels sprouts. 


connect DNA fragments together had to be found, 
organisms had to be enlisted to carry the new frag- 
ments, and biochemical methods had to be devel- 
oped to permit expression of the DNA fragments. 
In the pages ahead, we will explore some products 
of that imagination as we survey the breakthroughs 
that formed the basis for the new science of DNA 
technology. 


Humans have altered the genetic makeup of organisms for 
centuries by breeding plants and animals to create new 
organisms with the desired genetic and physical characteris- 
tics inherited from the parents. 


Microorganisms Are Used to Streamline 
DNA Technology 


Advances in DNA technology were made possible by 
seminal work with bacterial cells (Figure 4.2), which are 
cultivated easily and can be studied conveniently in the 
research lab. The most popular bacterium used in DNA 
technology is the “workhorse” Escherichia coli (E. coli). 
Having studied this bacterium for decades, scientists 
now know more about the biochemistry, morphology, 
physiology, and genetics of E. coli than we know about 
any other organism (including humans). As a result of 
such intense investigation, E. coli has become a model 
system for research designed to figure out how cells 
work. E. coli is also easy to grow in the lab, and, except 
for specific dangerous strains such as E. coli 0157:H7, 
E. coli is not a serious human pathogen. 


The bacterial chromosome is particularly well suited 
for DNA experiments because it is small compared to 
the large genomes in many eukaryotic cells, and most 
bacteria have a single double-stranded circular DNA 
chromosome in each cell. In eukaryotic cells (such as 
animal cells), each chromosome consists of a very long, 
linear DNA molecule, and typically occur in pairs. 
Because there is only one chromosome in a bacterium, 
its genes are expressed without the influence of genes 
on a second copy of the chromosome. Bacteria do not 
have nuclei, but the circular bacterial chromosome is 
attached to the membrane inside the cell through the 
activities of special proteins that interact with the lipid 
membranes and also bind to the chromosome DNA. 
Eukaryotic chromosomes, by contrast, are contained in 
the nucleus of the cell at all times except during cell 
division (see Chapter 9). Because the genetic code is 
nearly universal, the protein expression machinery in 
the bacterium can produce any foreign protein. 


The relative simplicity of bacterial cells and chromosomes 
compared to their eukaryotic counterparts make bacteria a 
good choice for experimental work. 


Viruses have proven to be very useful in many 
areas of DNA research. Viruses are microscopic parti- 
cles consisting of RNA or DNA genomes enclosed in 
a protein coat (called a capsid), which in some viruses 
is also surrounded by a lipid envelope (Figure 4.3). 
Viruses replicate only inside living prokaryotic and 
eukaryotic cells where they shed their protein coats 
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FIGURE 4.2 Bacteria in service to humanity. Electron micrographs of (A) E. coli, probably the most widely used bacterium in molecular 
biology, and (B) Pseudomonas, some strains of which can naturally break down chemical pollutants in the environment. Pseudomonas has 
also been engineered to assist in cleanup after oil spills damage the environment. 
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FIGURE 4.3 Some viruses are enclosed in a lipid membrane. (A) A cross section of an HIV virus particle. (B) HIV virus obtains a lipid enve- 
lope (membrane) as it exits a cell. (C) HIV virus (red) attacks an immune system cell. 


and use the molecular machinery of the host cell to 
produce new viruses. The genes of a DNA virus direct 
the synthesis of new viral proteins. The RNA of an 
RNA virus often masquerades as a cellular messenger 
RNA molecule, tricking the cell into producing viral 
enzymes and other viral proteins. 


Some types of viruses do not replicate themselves 
immediately. Instead, they integrate their DNA into 
the host's chromosome DNA and become part of the 
cell’s genome. The DNA of a herpesvirus, for example, 
can integrate (be inserted) into a nerve cell genome 
where it can remain within that cell for years, causing 
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FIGURE 4.4 Retroviruses integrate into the chromosome. (1) The retrovirus enters the cell. (2) The retroviral genome is made of RNA and is 
copied into double-stranded DNA (dsDNA) by a viral enzyme called reverse transcriptase. The DNA copy integrates into the host cell genome; 
the integrated viral DNA is called the provirus. (3) Later the host RNA polymerase transcribes the provirus DNA to make RNA copies of the 
retrovirus. (4) Once packaged with proteins, the genomes form new retroviruses that exit the cell to infect new cells and continue the cycle. 


periodic herpes infections in the body. A different 
process occurs with the human immunodeficiency 
virus (HIV), which causes AIDS. Once inside the cell, 
the RNA from this virus serves as a template for the 
synthesis of DNA, which is then inserted into the host 
cell genome (Figure 4.4). Once it is part of the host 
chromosome, the viral DNA is replicated as part of 
the chromosome and is expressed as mRNA encoding 
viral capsid proteins. The ability of viruses to integrate, 
or insert, their DNA into host cell genomes drew the 
attention of scientists who began using viruses as vec- 
tors to carry foreign genes into host cells. Numerous 
organisms in addition to viruses and bacteria are used 
in DNA technology, including yeast cells, various other 
microbes and viruses, insect and mammalian tissues, 
and a number of plants. We will mention their names 
and significance as we encounter them in succeeding 
chapters. 


Viruses consisting of no more than a protein shell covering 
a small genome, can be valuable allies in delivering DNA 
into prokaryotic and eukaryotic cells. 


Microbes Exchange DNA Naturally 


Research conducted in the 1950s demonstrated con- 
clusively that genetic recombination occurs among 
bacteria and could even involve bacteria-attacking 
viruses, bacteriophages. The story began with Griffith’s 
experiments in 1928 (see Chapter 1), which showed 


that genetic recombination occurs in bacteria, lead- 
ing to Avery’s identification of DNA as the molecule 
responsible for genetic recombination. By the 1950s, 
scientists knew of three ways that bacteria in the wild 
can change their genetic makeup: transformation, 
conjugation, and transduction. In nature, transfor- 
mation occurs when donor bacteria die, break open, 
and release their contents, including DNA, into the 
environment. The lucky recipient bacteria in the local 
neighborhood take up pieces of DNA, which are 
inserted into the bacterial chromosome (Figure 4.5). 
In the wild, DNA uptake and transformation take 
place in less than 1% of a bacterial population, but 
can bring about profound changes in the population. 
For example, one mutation might increase the patho- 
genicity of the organism (as in Griffith’s pneumococci), 
whereas a different mutation might affect the effi- 
ciency of transfer of bacterial plasmids carrying anti- 
biotic resistance genes from one microbe to another 
microbe in nature. A modified version of this transfor- 
mation process has become a standard method used to 
introduce plasmid DNA into bacterial cells (discussed 
later). 

Joshua Lederberg, Francois Jacob, and Elie 
Wollman established that bacteria can transfer genetic 
information from one bacterium to another by a proc- 
ess called conjugation (Figure 4.6). During conjuga- 
tion, the donor and recipient bacterial cells are joined 
to each other by a cytoplasmic bridge structure called 
a pilus. Single-stranded chromosome DNA from the 
donor bacterium crosses the cytoplasmic bridge into 
the recipient cell where it either integrates into the 
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FIGURE 4.5 DNA used in transformation is picked up by bacterial cells is integrated into the host chromosome. (A) When a cell dies, it 
releases its DNA into the surrounding environment (donor cell). (B) A nearby cell takes up the DNA (recipient). (C) One strand of the incom- 
ing double-stranded DNA is removed by enzymes, and the resulting single-stranded DNA displaces one strand of recipients DNA (D) The 
donor DNA has been integrated into the host recipient chromosome. 
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FIGURE 4.6 Conjugation in bacteria. (A) Two Escherichia coli cells are joined by a cytoplasmic bridge called a pilus that allows DNA to move 
from a donor cell to a recipient cell. One cell has produced three pili for conjugation with three cells. Conjugation provides an opportunity for the 
transfer of genetic material from one cell to another, but it is a one-sided transfer—not an exchange. This is one way that bacteria acquire genes 
for drug resistance. (b) A small circular DNA molecule called the F-plasmid is one of several types of conjugative plasmids that produce the pilus 
and the single-stranded F-plasmid DNA transfers through the pilus to a recipient bacterium that does not have the conjugative F-plasmid. 


recipients chromosome or it is copied into a double- 
stranded circular DNA molecule that can replicate like 
a plasmid. The newly acquired donor genes are then 
expressed in the recipient cell. It is important to note 
that conjugation and DNA transfer have been demon- 
strated between cells from different genera of bacteria 
such as Salmonella and Shigella cells. 


Bacteria exchange DNA naturally in the processes known 
as transformation and conjugation. The DNA that is trans- 
ferred often remains physically separate from the bacterial 
chromosome and forms a small, independent circle of dou- 
ble-stranded DNA called a plasmid. 
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FIGURE 4.7 Transducation of bacterial DNA by bacteriophages. A phage can incorporate a piece of bacterial host genome into its own 
genome, spreading bacterial genes along with viral infection. When a phage that has picked up some bacterial DNA infects another cell, the 
DNA form the previous bacterium is integrated into the subsequent bacterium’s genome along with the viral DNA. Thus, genes can be trans- 


ferred from one species of bacteria to another by bacteriophages. 


Joshua Lederberg and Norton Zinder discovered a 
third form of bacterial DNA transfer in the late 1950s 
called transduction (Figure 4.7). In this process, bac- 
teriophages “accidentally” transfer chromosome DNA 
among bacterial cells. The bacteriophage (or phage, for 
short) infects the bacteria by attaching to the cells and 
entering through the cell wall (see Chapter 5). Once 
inside the cell, the phage either replicates its DNA, 
packages it into newly synthesized capsid proteins, and 
the phage progeny are released, or the phage genome 
can integrate into and become part of the bacterial 
chromosome. When the phage DNA is excised from 
the chromosome, it occasionally brings along a piece of 
the bacterial chromosome DNA that is connected to the 
phage DNA. The bacterial DNA is replicated along with 
the phage DNA and packaged with the phage genome 
into the new capsids. Later the newly produced phage 
will be released from the cell and go on to infect other 
bacteria, continuing the transduction process. 

Transduction is an infrequent event in nature, but 
the potential for transduction is great for viruses that 
integrate into host cell genomes. In some cases, entire 
genes are transduced. Pathogens include the Diphtheria 
bacterium, which harbors a bacteriophage that pro- 
duces a highly destructive toxin protein, and Salmonella 
bacteria, which also produce toxins that can cause 
serious food-borne infections. 


When viruses infect bacteria, they sometimes bring with 
them DNA from previously infected bacteria along with their 
own DNA. This means of DNA transfer between bacterial 
species is known as transduction. 


The biological processes of transformation, con- 
jugation, and transduction permit bacteria to natu- 
rally acquire new DNA sequences and assume new 
genetic characteristics. In the 1950s, molecular biolo- 
gists first began using the word “recombined” to refer 
to genetically altered bacteria containing foreign 
DNA. Gradually, the term recombinant DNA crept 
into the scientific lexicon and refers to a DNA mole- 
cule containing foreign DNA sequences linked to host 
DNA sequences. Scientists knew that recombination 
between the DNA molecules from different species 
occurs in nature, and they began to wonder whether 
they could perform similar DNA recombination experi- 
ments in the laboratory. 


Box 4.1 Superbugs 


The Rise of Antibiotic Resistance 
When antibiotics were first introduced into medical prac- 
tice in the 1930s and 1940s, they were a real miracle. 
Suddenly it became possible to cure previously incurable 
widespread diseases caused by pathogenic bacteria. In the 
early 1900s, tuberculosis (TB), called “consumption” at 
the time, was responsible for one of every four deaths in 
England. Because tuberculosis is infectious, people with 
the disease were isolated in medical facilities to help curb 
its spread; however, fully one-half the people who entered 
a sanatorium died from TB. 

The horrible toll taken by TB changed dramatically in 
the Western world with the advent of antibiotics and a vac- 
cine against tuberculosis for children; tuberculosis was no 
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Box 4.1 Continued 


longer a public menace. But in Africa, TB is still a killer, in 
part because of scarcer medical resources and higher rates 
of HIV/AIDS, which renders people more susceptible to TB. 
Additionally, over the past two decades many pathogenic 
microbes, including some that cause TB, have become 
increasingly resistant to antibiotic treatment. We are poten- 
tially facing a return to the days when bacterial infections 
were not easily cured. 

Antibiotics kill bacteria by disrupting key cellular pro- 
cesses that are required for the bacteria to survive and 
reproduce, such as DNA and RNA synthesis, protein syn- 
thesis, enzymatic systems, and cell wall maintenance. 
Importantly, antibiotics are specific for bacteria and do not 
affect viruses or the processes of eukaryotic cells. 

Bacteria can acquire specific mutations that render 
them resistant to the action of antibiotics. At a rate of about 
1 in 1 billion, bacteria experience a chance series of muta- 
tions that lead to resistance. This can take several forms, the 
most common being the modification of an existing bacte- 
rial enzyme so that it alters the structure of the antibiotic, 
rendering it harmless. Other mutations change existing 
molecular pumps so they eject the antibiotic from the cell, 
or modify enzymes in the pathways under attack (for exam- 
ple, the enzymes involved in DNA synthesis) that make 
them resistant to the antibiotic. Once a bacterium is resist- 
ant to an antibiotic, it can spread this ability by transferring 
the appropriate DNA through the processes of transduction, 
transformation, or conjugation, as well as by cell division. 

Recently, large increases in antibiotic resistance exceed 
the frequency accounted for by the bacterial mutation rate 
alone. The widespread addition of antibiotics to feed for 
healthy livestock as well as the tendency for the public to 
demand antibiotic medications to treat viral illnesses (anti- 
biotics do not treat virus infections) have encouraged the 
widespread growth of bacteria that are resistant to more 
than one antibiotic, known informally as “superbugs.” 
Ongoing efforts to solve this problem include restrictions 
on the routine use of antibiotics, reductions of antibiotics in 
some livestock feed, and improved public education about 
the correct use of antibiotics, among others. The race is on, 
and we must not let the superbugs win. 


Restriction Enzymes Enable Accurate and 
Controlled Cutting of DNA 


In the 1950s, Salvador Luria and his colleagues 
observed that E. coli bacteria could somehow resist 
infection by bacteriophages by “restricting” (i.e., inhibit- 
ing) replication of the invading phage. By 1962, Werner 
Arber and his research group had found an enzyme 
system in bacteria that could inhibit phage replication 
by cleaving the bacteriophage DNA once it entered the 
cell. The team isolated the DNA-cleaving enzyme from 
E. coli cells and called it an endonuclease, because this 
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enzyme cleaves nucleic acids. Endonucleases cut the 
DNA from within the DNA molecule, while exonucle- 
ases digest the ends of the DNA molecule. In bacteri- 
ophage infections, the £. coli endonuclease cleaves the 
invading phage DNA but does not cut the host bacte- 
rial DNA. This is the “restriction” part of what we now 
know as the bacterial restriction and modification 
system, which protects the bacteria against bacteri- 
ophage infection. The endonuclease enzyme can dis- 
tinguish between the invading phage DNA and the 
host DNA because the bacterial chromosome is modi- 
fied during replication by adding methyl (CH3-) groups 
to specific DNA bases. This methyl “tag” identifies the 
methylated DNA as the native chromosome and pro- 
tects it from cleavage by the enzymes. The phage DNA, 
however, is unmethylated and is a target for the endo- 
nuclease to “restrict” or cut the phage DNA to prevent 
replication of the phage. 


Endonuclease enzymes that occur naturally and function 
in the restriction and modification system in bacteria and 
quickly became known as restriction enzymes. These spe- 
cial enzymes play an important role in microbial warfare, 
but they were also essential for the early development of 
the infant science of recombinant DNA technology. 


In 1970, Hamilton Smith and his colleagues isolated 
a new restriction enzyme called Hindlll from the bac- 
terium Haemophilus influenzae. Daniel Nathans used 
the Hindlll restriction enzyme to cut viral DNA isolated 
from simian virus capsids (SV40, Figure 4.8). SV40 is 
an animal virus with a small circular double-stranded 
DNA genome, which is quite a bit easier to manipulate 
and study than the much larger bacterial chromosomes. 
Smith discovered that the Hindlll restriction enzyme 
cut each SV40 DNA genome at only one site, even 
when an excess of enzyme was present. The scientists 
were amazed to learn that the Hindlll enzyme cut the 
SV40 DNA in exactly the same location along the DNA 
in every SV40 DNA molecule that was exposed to the 
enzyme (Figure 4.9). For their work, Arber, Smith, and 
Nathans were awarded the Nobel Prize in physiology 
or medicine in 1978 (Figure 4.10). 

As scientists gained more hands-on experience 
with restriction enzymes in the lab, they learned that 
restriction enzymes could cut any DNA molecule on 
demand, regardless of the source of the DNA (except 
when the DNA was protected by chemical modifica- 
tion). For example, the HindIII restriction enzyme will 
readily cleave any DNA molecule that contains its 
specific Hindlll recognition DNA sequence, regardless 
of the source of the DNA molecule: virus, plant, ani- 
mal, or human. It is easy to appreciate why restriction 
enzymes quickly became essential tools for any type 
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FIGURE 4.8 SV40, an animal virus (A) The structure of the SV40 polyoma virus with a small, double-stranded circular DNA genome that 


infects animal cells. (B) Electron micrograph of SV40 capsids. 
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5'-AAGCTT-3' 
3’-TTCGAA-5' 


Like many restriction enzymes, 
Hindlll cuts in a staggered fashion: 
5’-A 5’AGCTT-3' 
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FIGURE 4.9 Circular SV40 genome is cut by HindlIll. The Hindlll 
restriction enzyme cuts the SV40 circular DNA at a specific DNA 
sequence, which happens to occur only once in SV40 DNA. This 
converts the circular DNA into a linear double-stranded DNA 
molecule. The SV40 genome contains about 5000 base pairs (5 kb) 
(circular or linear). 


of gene research and for the emerging field of bio- 
technology. At first, individual research groups had to 
isolate their own restriction enzymes from bacteria if 
they wanted to cut DNA in an upcoming experiment. 
This was often easier said than done, however, as it is 
necessary to grow a myriad of different bacterial spe- 
cies to obtain access to the required enzymes. Many 
microorganisms are notoriously difficult to grow in the 
lab, often requiring special conditions such as particu- 
lar additives to the media or extreme temperatures. In 
response to the increasing demand, it was not long 
before biotechnology companies began to sell dozens 
of restriction enzymes for research applications, allow- 
ing scientists to plan experiments based on restriction 
enzymes they could purchase. 

Scientists in the late 1990s could choose from 
thousands of commercially available highly purified 


restriction enzymes for use in DNA technology. To avoid 
confusion about the identity of an enzyme, its cleav- 
age site, or its source organism, restriction enzymes 
are named according to a system that starts with the 
name of the bacterium from which they were isolated. 
The first letter of the enzyme name is the first letter of 
the bacterium’s genus name (in italics), followed by the 
first two letters of its species (in italics), then a letter 
representing the bacterial strain, and finally a Roman 
numeral that signifies the order in which this enzyme 
was discovered (in organisms that have more than one 
restriction enzyme). For example, the EcoRI restriction 
enzyme was named as follows: 


E Escherichia (genus) (in italics) (capitalized) 


co coli (species) (in italics) (not capitalized) 


R RY13 (strain) 


l First one identified Order the enzyme was identified in 


bacterium (Roman numerals) 


Each restriction enzyme cuts DNA only at a specific 
base sequence, called the recognition sequence (Table 
4.1), which can range from 4 bp to rare enzymes that 
recognize sites of 16 or 20 base pairs. Any given rec- 
ognition sequence, especially a longer one, is poten- 
tially very rare in a genome of millions or billions of 
DNA base pairs. So how do enzymes find their recog- 
nition sequences? Restriction enzymes adopt specific 
3D shapes that progress along nonspecific sequences 
in a DNA helix, “scanning” until reaching their specific 
recognition sequence. Most restriction enzymes recog- 
nize and cut at a palindrome DNA sequence with two- 
fold rotational symmetry. The recognition site has the 
same base sequence on the top strand (read 5’ to 3’, 


86 


(A) (B) 


DNA and Biotechnology 


= 
a 


(C) 


FIGURE 4.10 Scientists who won the Nobel prize for discovering restriction enzymes. (A) Werner Arber, (B) Hamilton Smith, and (C) Daniel 
Nathans, winners of the 1978 Nobel Prize in physiology or medicine for their work with restriction enzymes. 


left to right) of the DNA and on the bottom strand 
(read 5’ to 3’, right to left) of the DNA. In the English 
language, a palindrome is when the letters of a sen- 
tence spell the same words whether they are read for- 
ward or backward; for example, “Madam, I’m Adam” 
or the author’s current favorite, “Sit on a potato pan, 
Otis.” Note that in DNA palindromes, both strands 
of the DNA must be read to reveal the palindrome 
(see Table 4.1). 

The BamHI restriction enzyme attaches to the DNA 
helix as a dimer (two BamHI proteins bind together) 
and scans the DNA helix looking for the recognition 
sequence, 5’-GGATCC-3’ (Figure 4.11A). BamHI cuts 
the DNA backbone at the same point along the rec- 
ognition sequence on each strand, but the locations 
of the cuts are staggered on the DNA helix, leaving 
overhanging, single-stranded ends after the DNA is cut 
(Figure 4.11B). BamHI screens the sequences while 
loosely bound to the DNA helix until the enzyme finds 
the recognition sequence, then the proteins change 
conformation so they fit tightly into the DNA helix, 
which stimulates the enzyme to cut the DNA (Figure 
4.12). 

Restriction enzymes cut DNA at specific base 
sequences and as a result produce one of three differ- 
ent types of ends on the DNA: blunt, 5’ cohesive ends, 
or 3’ cohesive ends (Figure 4.13). The cohesive DNA 
ends generated by restriction enzymes are also called 
sticky, staggered, overlapping, and overhanging DNA 
ends. The single-stranded DNA ends can only form 
base pairs with the complementary single-stranded 
ends generated by the same or a different enzyme 
(Figure 4.14). 


Restriction enzymes cut DNA at specific sequences, regard- 
less of the source of the DNA, and often cut at staggered 
sites along the double helix, leaving short, complementary 
“sticky ends” on the DNA. 


The use of restriction enzymes as tools to cut and 
then connect DNA fragments from different organ- 
isms is a central approach used in genetic engineer- 
ing. Restriction enzymes are precise molecular tools 
that scientists use to cut any kind of DNA regard- 
less of source. When the goal of the experiment is to 
connect two DNA fragments together, the source of 
the DNA (bacterium, bird, or buttercup) is not relevant 
as long as the sticky ends of the DNA fragments are 
complementary, and can base pair together to connect 
the DNA fragments together. It is important to note that 
not all restriction enzymes cut DNA and leave sticky 
ends. Many restriction enzymes cut the DNA back- 
bone at two sites directly across from each other on 
the helix and generate blunt-ended DNA fragments 
that lack single strands. 


Sealing the Deal: DNA Ligases Connect 
the DNA Backbones 


Bringing together the complementary sticky ends on 
two DNA fragments does not of itself forge a covalent 
bond between the DNA ends (Figure 4.14). After base 
pairing occurs, there is a gap or “nick” remaining in 
the DNA backbones so that the molecules are held 
together by complementary base pairing. The hydrogen 
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TABLE 4.1 Some Restriction Enzymes, Their Sources, and 
Their Recognition Sites 


Enzyme Microbial Source = Sequence* 

Alul Arthrobacter luteus l 
5'-A—G*¥C—T—3' 
pa eran 

BamHI Bacillus 


amyloliquefaciens H 5" ote A—T—C—C—3' 


3'—C—C—T—A—G fe 5! 
EcoRI Escherichia coli l 
5—G¥A—A—T—T—C—3' 
3'—C—T —T—A Apn 5 
EcoRII Escherichia coli l 
5'YC—C—T—G—G—3' 
Be ee 
Haelll Haemophilus 4 
aegyptius 5'—G—G*C—C—3' 
os gee 
Hindlll_ Haemophilus i 
influenzae b 5'-AYA—G—C—T—T—3' 
3'—T —T—C—G—A i 5' 
Pstl Providencia stuartii il 
5'—C—T—G—C—A*G—3' 
3 ee C—G—T—C—5' 
Sall Streptomyces albus 


C—G—A—C—3' 


3'—C—A—G—C—TzG—S' 
* 


bonds between the base pairs together are not strong 
enough to stabilize the overlapping DNA ends indefi- 
nitely at physiological (body) temperatures. This prob- 
lem is solved using another enzyme isolated from 
cells, called DNA ligase, which catalyzes the forma- 
tion of the strong covalent chemical bonds in the back- 
bones of the two DNA helices. 

In the lab, scientists typically use T4 DNA ligase, 
which is made in large amounts in bacterial cells 
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infected with T4 bacteriophage. The T4 DNA ligase 
enzyme connects (ligates) the backbones of DNA 
strands together by forming a covalent phosphodi- 
ester bond between the phosphate group at the 5’ car- 
bon of one nucleotide and the 3’ carbon of the next 
nucleotide (Figure 4.15 and Figure 4.16). This covalent 
bond seals the DNA backbones together and creates 
a longer DNA molecule. DNA ligase can also link 
together blunt-ended DNA fragments, but it is much 
more difficult to join them together because they lack 
the stabilizing influence of the complementary sticky 
ends on both DNA fragments. 


DNA molecules with sticky ends base pair with each other 
temporarily; the DNA ligase enzymes connect the two 
DNA molecules together end to end by forming covalent 
backbone bonds between them. 


When DNA molecules are cut by restriction 
enzymes, the DNA fragments produced by the reaction 
are often analyzed by separating the DNA fragments 
by gel electrophoresis. In this common technique, 
the DNA sample is loaded into a well (a depression) 
located on one end of the solid gel. When an electri- 
cal current is applied to the gel, the DNA fragments, 
which have a negative charge, migrate through the gel 
toward the positive electrode. The shorter DNA frag- 
ments migrate more quickly through the gel than the 
longer DNA fragments. The DNA in the gel is stained 
with ethidium bromide and visualized under ultravio- 
let light (Figure 4.11 B). 


Plasmids Are Used as DNA Carriers 


In the early 1970s, Paul Berg and his team at Stanford 
University began to study enzymes that cut E. coli 
chromosome DNA, with the goal of using restric- 
tion enzymes to cut and connect DNA from different 
sources. At the time Berg’s group was using a restric- 
tion enzyme that leaves blunt ends on the DNA frag- 
ment, making the ends difficult to ligate together (at 
the time, enzymes that created sticky ends were as yet 
unknown). They overcame this problem by adding arti- 
ficial sticky ends to the DNA fragments. To one DNA 
fragment they added a tail of all “A” bases, and they 
added a tail of all “T” bases on to the end of the other 
DNA fragment. When the tailed DNA fragments are 
mixed together, the single-stranded A and T tails formed 
A-T base pairs, permitting the ends of the DNA frag- 
ments to overlap and base pair. Connecting together 
these two DNA fragments from different organisms 
produced the first recombinant DNA molecule. Paul 
Berg shared the 1980 Nobel Prize in chemistry for this 
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FIGURE 4.11 The BamHI enzyme recognizes and cuts double-stranded DNA at GGATCC. BamHI, like most restriction enzymes, recognizes 
and cuts DNA at a site with two-fold rotational symmetry—that is, the base sequence reads the same if read left to right on the top strand of 
DNA, and right to left on the bottom strand. The BamHI cleavage site is shown in red, and cutting generates sticky ends on the resulting the 


DNA fragments, also known as cohesive or complementary ends. 


BamHI dimer protein 


Non-specific DNA sequence 
(A) 


BamHI dimer protein 


DNA recognition sequence 
for BamHI 


(B) 


FIGURE 4.12 The BamHI enzyme dimer (two proteins) binds to DNA. The BamHI enzyme dimer binds to nonspecific DNA sequences 
while scanning the genome for occurrences of the specific BamHI recognition sequence. (A) DNA is shown end-on (looking down the dou- 
ble helix). The BamHI dimer binds to the DNA double helix but cannot cut the DNA without binding to the BamHI specific recognition DNA 
sequence (B) When BamHI dimer finds the recognition sequence, it binds more closely with the double helix and catalyzes the DNA cleav- 


age event. 


amazing feat, signaling the use of plasmids as vectors 
to carry recombinant DNA molecules. 

Meanwhile, for the first time, Herbert Boyer’s 
research group isolated the EcoRI restriction enzyme 
from E. coli bacteria and discovered that when EcoRI 
cuts DNA it produces 5’ sticky DNA ends. At the same 
time in Stanley Cohen’s laboratory at Stanford University, 


scientists had identified small circular double-stranded 
DNA molecules called plasmids (Figure 4.17), which 
are found in bacteria but not in most eukaryotic organ- 
isms (we now know that some eukaryotic cells do carry 
plasmids). In the cell, plasmid circles adopt a com- 
pact, twisted shape (see Figure 4.17). Plasmids contain 
low, variable numbers of genes (compared to a few 
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FIGURE 4.13 Cleavage by restriction enzymes create three different 
types of DNA ends. Different restriction enzymes cut DNA at different 
recognition DNA sequences. Cleavage generates one of three types of 
DNA ends: 5’ staggered ends, blunt ends, or 3’ staggered ends. 
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FIGURE 4.14 Base pairing between DNA bases holds complemen- 
tary DNA strands together. Hydrogen bonds (H-bonds) form base 
pairs: A base pairs with T and G base pairs with C as shown. Note 
that A-T base pairs contain 2 H-bonds and G-C base pairs contain 3 
H-bonds. 


thousand genes in most bacterial chromosomes) and 
usually are not essential for bacterial growth; they 
can be lost without significant harm to the bacterium. 
Plasmids replicate autonomously (separately from the 
bacterial chromosome) and do not integrate into the 
bacterial chromosome (Figure 4.18). 

In 1972, Cohen constructed a new DNA plasmid 
(Figure 4.20) with several useful features, (1) an EcoRI 
recognition site at a single location on the plasmid 
DNA circle, (2) an origin of replication, which allows 
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the plasmid to replicate in bacteria, and (3) a tetracy- 
cline antibiotic resistance gene, so that the bacteria 
carrying this plasmid would be resistant to the effects 
of the antibiotic tetracycline and survive, whereas 
bacteria lacking this plasmid would be killed by 
the antibiotic. In the early days of this emerging tech- 
nology, scientists named their DNA inventions after 
themselves, hence this plasmid is called pSC101 (“SC” 
for Stanley Cohen). 

Cohen studied the best way to insert the pSC101 
plasmid DNA into the host bacterial cells using an arti- 
ficial form of the natural process of transformation. 
Cohen grew the bacterial cells to a specific cell den- 
sity and treated the cells with very cold calcium buffer 
to make them competent to receive foreign DNA. 
Then the temperature was abruptly raised so that the 
bacterial cells and the recombinant vector DNA were 
submitted to a sudden, temporary increase in tem- 
perature, or “heat shock,” at 42°C for 45 seconds. The 
exact mechanism behind heat shock is not clear, but of 
the few bacterial cells that take up a plasmid, almost 
always the bacteria will pick up only one plasmid 
DNA. This is a fundamental fact that permits the proc- 
ess of DNA cloning to create large numbers of iden- 
tical copies of the same plasmid. The conditions for 
bacterial transformation are surprisingly strict, and little 
variation is tolerated if transformation is to be achieved 
(Figure 4.20). 

Once inside the cell, the plasmid replicates and 
forms hundreds of identical copies of the plasmid. 
Whatever genes are carried on the plasmid DNA, 
whether of bacterial or foreign origin, are replicated 
as part of the plasmid. As the bacterium harboring the 
plasmid rapidly multiplies, each new cell inherits a 
number of plasmids. Before long, a single bacterium 
has produced a culture of millions of descendants, 
each carrying identical copies of the plasmid DNA. 
Such a population of cells derived from a single parent 
cell is called a clone (Figure 4.20). The cloned popula- 
tion of cells together carries millions of copies of iden- 
tical plasmids. It did not take too much imagination 
to realize that plasmids could become the ideal DNA 
carriers, or vectors, for transporting foreign genes. The 
stage was set for the recombinant DNA experiments 
that would not only change science forever, but would 
also alter the course of history. 


Plasmid DNA molecules are closed double-stranded circles 
that can be easily inserted into bacterial cells. Plasmids 
replicate independently of the bacterial chromosome and 
make numerous identical copies of the plasmid DNA in 
the cell. 
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DNA backbone. 
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FIGURE 4.15 DNA ligase closes the gaps in the DNA backbones. (A) When two DNA fragments are brought together, a covalent bond seals 


the backbones and prevents the DNA fragments from coming apart easily. (B) The DNA ligase enzyme uses the phosphate on the 5’ end of a 


DNA fragment and the hydroxyl on the 3’ end of the other DNA fragment to provide the energy to forge the new phosphodiester bond. The 
result is a single DNA double-helix molecule. 
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FIGURE 4.16 Ligase enzymes seal the phosphodiester backbone. After the single-stranded complementary DNA ends base pair together, 
ligase enzymes seal the nick that remains in the DNA backbone. Once the nicks in the DNA backbone are sealed, these particular DNA frag- 
ments will remain connected until they are cut again with EcoRI. (Other enzymes would cut at different recognition sites.) 
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FIGURE 4.17 Plasmids are double-stranded DNA circles. (A) The relaxed circle form of a double-stranded DNA plasmid (upper), which, in 
a cell, is usually found in a supercoiled form (lower). (B) The map of a simple DNA plasmid containing genes coding for antibiotic resistance 
and an origin of replication site (Ori) so the plasmid can replicate independently of the host bacterial chromosome. (C) An electron micro- 
graph of a relaxed circular DNA plasmid. 
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FIGURE 4.18 Plasmid DNA circles replicate independently of the host chromosome. Every time the bacteria carrying the plasmid divide, 
plasmid copies are transmitted to the offspring cells. Plasmids are often used in recombinant DNA research to transfer genes between cells. 


he gave a talk about the new EcoRI restriction enzyme. 
Fortunately, Stanley Cohen was in the audience. After 
the meeting session concluded, Cohen invited Boyer to 
lunch to talk about a possible scientific collaboration. 
(Both scientists are shown in Figure 4.21.) As it turns 
out, this lunch meeting would send DNA technology 
into a new era. 


THE ADVENT OF RECOMBINANT DNA 
EXPERIMENTS 


Boyer and Cohen Pioneer the Tools to 
Transfer Genes 


One could say that the field of DNA technology 


came into being over pastrami sandwiches at a deli in 
Waikiki Beach in 1972. It happened that Herbert Boyer 
was attending a scientific conference in Hawaii where 


Cohen had been successfully experimenting with 
plasmids, but he was having technical difficulties 
cleaving the plasmid DNA, and Boyer’s new restriction 
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enzyme, EcoRI, seemed like the ideal solution. Boyer 
agreed, and the bargain was struck. Boyer and Cohen 
joined forces and used Boyer’s EcoRI enzyme to cut 
two different plasmids and then try to ligate the two 
plasmids together to form a single, much larger plas- 
mid (Figure 4.22). If successful, they would try to 
produce a recombinant DNA molecule by inserting 
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rep »SC101 
ori 


FIGURE 4.19 Plasmid pSC101. Plasmid pSC101 was the first 
plasmid constructed with recombinant DNA experiments in mind. 
Plasmid pSC101 contains a unique EcoRI recognition site on the 
double-stranded plasmid DNA, an origin of replication allows the 
plasmid to replicate in bacteria, and a gene that confers tetracycline 
resistance. The modern version of this plasmid has many additional 
restriction sites and other useful features. 
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foreign DNA from a different species into a bacterial 
plasmid. 

Boyer and Cohen performed these pioneering 
recombinant DNA experiments in 1973, 20 years after 
Watson and Crick had published their historic article 
on the structure of the DNA double helix. In their first 
experiments, Boyer and Cohen successfully cut and 
ligated together two different plasmids, pSC101 (con- 
taining the tetracycline resistance gene) and plasmid 
pSC102 (containing a gene for resistance to another 
antibiotic, kanamycin) and then used the ligated DNA 
to transform E. coli host cells that are sensitive to antibi- 
otics. The role of antibiotic resistance or other methods 
to select for cells that carry the desired recombinant 
DNA molecules was a crucial strategy for Boyer and 
Cohen, and it continues to be an important part of cur- 
rent recombinant DNA experiments. At the start of the 
experiment, the recipient bacterial cells were sensitive 
to both antibiotics. If the cells were spread on solid 
media containing either tetracycline or kanamycin, the 
bacteria could not survive and grow. 

Boyer and Cohen cut both plasmids pSC101 and 
pSC102 with EcoRI, mixed the two cut plasmids 
together, treated them with DNA ligase, and then used 
the plasmid mix to transform the antibiotic-sensitive 
bacteria (Figure 4.23). Once inside the bacteria, the 
antibiotic resistance genes on the recombinant plas- 
mids made proteins that rendered the bacteria resistant 
to both tetracycline and kanamycin, therefore these 
cells could grow in the presence of both antibiotics. 
Bacteria containing just one parent plasmid, either 
pSC101 or pSC102, were resistant to one antibiotic, 
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FIGURE 4.20 Transformation is used to introduce DNA plasmids into bacteria to make a clone. (A) Recipient bacteria without plasmids are 
sensitive to the antibiotic tetracycline. (B) Plasmid DNA with tetracycline resistance gene are used to transform the bacteria; each recipient cell 
receives no more than one plasmid, and most bacteria get none. (D) After treatment with the antibiotic, the surviving cells are tetracycline-resistant. 
(E) The transformed bacteria that are resistant to tetracycline divide and make more bacteria, all of which contain identical copies of the plasmid 
DNA. The culture is always grown on “selective medium” containing tetracyclin to prevent the growth of any bacteria without a plasmid. The 
plasmids replicate independently of the bacteria chromosome and make many identical plasmid copies in each cell. 
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FIGURE 4.21 The first gene engineers. (A) Herbert Boyer. (B) Stanley Cohen. (C) Boyer and Cohen. 
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EcoRI sticky ends base pair together and the DNA backbone is sealed by 
DNA ligase. The ligated DNA contains an EcoRI site that can be cut by the 
EcoRI enzyme (blue arrows). 


FIGURE 4.22 Using EcoRI enzyme to combine two different DNA plasmids. The restriction enzyme EcoRI is used to cleave the DNA of 
two different plasmids (blue and yellow) as indicated by red arrows. Each plasmid has a single EcoRI cleavage site so both plasmids become 
converted into linear molecules by cutting with EcoRI. The Eco RI enzyme cuts DNA at a specific DNA sequence and produces fragments 
with sticky DNA ends. The cut plasmids are mixed together so that the sticky ends generated by cleavage with EcoRI can join, forming much 
longer DNA molecules. All the necessary backbone bonds are catalyzed by DNA ligase. 
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FIGURE 4.23 Boyer and Cohen cut and recombine two different plasmids. Two plasmids, pSC101 and pSC102, each containing a different 
antibiotic resistance gene, are cut with EcoRI, creating the same sticky ends on the DNA fragments. The sticky ends base pair together and the 
DNA ligase enzyme is added to make backbone bonds, sealing the plasmids together. 


In a second set of key experiments, Boyer and 
Cohen decided to clone a gene from the African 
clawed frog (Xenopus laevis) into the pSC101 plasmid 
DNA vector (Figure 4.24). They cut both the frog DNA 
and the pSC101 plasmid DNA with EcoRI and mixed 
the cut DNA fragments together, allowing the EcoRI-cut 
complementary single-stranded ends to base pair with 
each other. Once the DNA ligase enzyme had sealed 
the DNA backbones, the new recombinant plasmid 
containing the frog DNA was formed and could rep- 
licate in the bacterial cells. The researchers called the 
recombinant plasmid, made up of two different DNAs, 
a chimera, named for the creature of Greek mythology, 
a combination of lion-goat-serpent. 

In the next experiment, the researchers decided to 
see whether the cells carrying the recombinant plas- 
mid could successfully express the foreign frog gene 
and direct the synthesis of the frog protein in the bac- 
terial cells. Cohen grew a large number of E. coli cells 
containing the recombinant plasmids and analyzed the 
proteins made in the bacteria. He found that the E. coli 
cells did produce the frog protein that is normally made 


only in frog cells. They had successfully moved the 
X. laevis frog gene into a completely different organism 
and had successfully expressed a frog protein in bac- 
terial cells. The scientific media and the national press 
trumpeted that the Boyer and Cohen experiments had 
breached a theoretical barrier separating biological 
species and launched the era of recombinant DNA 
technology. As one observer noted, “Biotechnology 
used to be BBC (before Boyer-Cohen); now it is ABC 
(after Boyer-Cohen).” Many years later, in 1996, Boyer 
and Cohen were honored with the prestigious Lemelson- 
MIT prize and shared the $500,000 award. 

Those in the emerging science of molecular biology 
were quick to grasp the implications of recombinant 
DNA technology, and many scientists began to per- 
form their own gene manipulation experiments. Within 
weeks, scientists were investigating other gene transfer 
and cloning experiments in an effort to transcend other 
species barriers. In one experiment, genes from the 
common skin bacterium, Staphylococcus aureus, were 
transferred into E. coli cells. Other scientists attempted 
to isolate human genes and insert the human DNA into 
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FIGURE 4.24 Boyer-Cohen cloned the first recombinant plasmid carrying foreign frog DNA (1973). (1) Plasmid pSC101 containing the 
tetracycline resistance gene (tetR) was cut with EcoRI (red arrows). (2) Donor DNA from the frog genome was also cut with EcoRI. (Note that 
EcoRI acts at the same recognition sites in the plasmid vector and the donor frog DNA.) (3) Fragments of donor frog DNA are combined with 
the cut plasmid DNA. (4) Complementary base pairing takes place between the EcoRI-cut DNA ends. DNA ligase is added to seal the DNA 
backbone to form a recombinant DNA molecule: bacterial plasmid and frog DNA. (5) The recombinant plasmid DNA is introduced into tetra- 
cycline-sensitive E. coli bacterial cells by transformation, and subsequently the cells are grown in medium containing tetracycline so that only 
the transformed cells survive. (6) The bacteria carrying the plasmids divide to make many identical clones. (7) The frog DNA in the plasmid 


causes the production of frog proteins in the bacterial cells. 


plasmids to grow in bacteria. The scientific community 
worldwide began to speculate on the extremely power- 
ful implications of the first experiments in recombinant 
DNA technology. People predicted that genes could 
be inserted into living cells to relieve genetic deficien- 
cies and that rare pharmaceutical proteins could be 
cheaply produced in mass quantities. Many still dream 
of inexpensive bioenergy sources and infants free of 
birth defects. There seems to be no limit to what the 
new technology might accomplish. 


Initial Reactions: How Safe Is DNA 
Technology? 


Despite the widespread enthusiasm for the new recom- 
binant DNA technology by many scientists, others 


warned of potentially dangerous consequences and 
suggested a more cautious approach to recombinant 
DNA research. Especially alarming to many was the 
proposal by Paul Berg to insert genes from a cancer 
virus into bacterial cells. Colleagues emphatically 
pointed out that if the E. coli cells carrying the cancer 
virus DNA escaped from the laboratory, the engineered 
bacteria could easily populate the human intestine 
(where E. coli normally thrives) and express the cancer 
genes in a human host. Berg responded to the concerns 
of his scientific peers and canceled the experiment. 
But the safety issues extended far beyond the work 
in Berg’s lab, and through the next few months, 
groups of scientists began to discuss the safety issues 
surrounding the use of DNA technology and recom- 
binant DNA. 
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FIGURE 4.25 Scientists at the Asilomar Conference of 1975. 
Addressing the concerns of their own community as well as others, 
scientists drafted principles for containment facilities for DNA tech- 
nology experiments. 


In 1974, in an unprecedented move, Paul Berg 
and nine other respected scientists wrote a letter that 
appeared simultaneously in three prestigious science 
research journals Science, Nature, and Proceedings of 
the National Academy of Sciences. At the same time, 
a news conference was held to coincide with the pub- 
lication of the letter, which clearly pointed out the 
potential danger: 


Recent advances in techniques for the isolation and rejoin- 
ing of segments of DNA now permit construction of biologi- 
cally active recombinant DNA molecules in vitro. Although 
such experiments are likely to facilitate the solution of 
important theoretical and practical biological problems, 
they would also result in the creation of novel types of DNA 
elements whose biological properties cannot be completely 
predicted.... There is a serious concern that some of these 
DNA molecules could prove biologically hazardous. 


The letter written by Berg and his colleagues asked 
scientists worldwide to institute a voluntary morato- 
rium on certain types of DNA experiments until after 
an international conference could consider necessary 
safeguards. This was the first time that scientists had 
ever voluntarily restricted research that had not yet 
proven dangerous. Although the recommendations did 
not carry the force of law, scientists around the world 
subscribed to the request. In February 1975, a group of 
139 leading researchers from 17 nations assembled for 
a four-day meeting held at Asilomar, in Pacific Grove, 
California. The attendees developed guidelines and 
recommendations for scientists conducting experiments 
using recombinant DNA technology (Figure 4.25). 
One goal of this meeting was to reassure an uneasy 
public that the host bacteria used for recombinant 
DNA experiments were specifically bred and geneti- 
cally “disarmed” so they could not survive outside the 
laboratory and would not grow in the human intes- 
tines. Scientists also wrestled with the actual danger 
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posed by recombinant DNA work, as there was no 
evidence that either supported or rejected the haz- 
ardous nature of DNA work. The scientists opted for 
caution and set about policing their own research. 
Different recombinant DNA experiments were clas- 
sified as low risk, medium risk, or high risk, and very 
risky experiments (such as those previously proposed 
by Berg) would not be conducted until better safety 
methods were developed. They discussed strategies to 
contain the DNA experiments, including physical and 
biological containment, and high priority was given to 
developing strains of bacteria unable to survive outside 
the laboratory. 

The Asilomar recommendations led to the forma- 
tion of the Recombinant DNA Advisory Committee 
of the National Institutes of Health (NIH), which soon 
issued stringent guidelines that paralleled the Asilomar 
recommendations. However, with additional years of 
research, it became clear to the scientists and public 
alike that recombinant DNA technology would not 
create a celebrated “Andromeda bug,” and as a result 
the regulations were periodically revised and relaxed. 
The Recombinant DNA Advisory Committee at NIH 
continues to be the watchdog over DNA technology 
research. 


Scientists were alert to the potential dangers of gene experi- 
mentation and recombinant DNA technology. They organ- 
ized an international effort to carefully monitor and regulate 
scientific activity in this field; the NIH Recombinant DNA 
Advisory Committee grew and has continued to regulate 
recombinant DNA research for about 35 years. 


Recombinant DNA Techniques Become 
Widely Applied 


The successes in recombinant DNA technology gave 
rise to the discipline of biotechnology, which focuses 
on the commercial applications of molecular genetics. 
Biotechnology is a vast discipline in which the tools 
and concepts of molecular biology are used to solve 
a wide range of problems associated with pollution, 
food production, energy production, and synthesis of 
new medicines (Figure 4.26). Biochemists soon came 
to regard bacteria and other microbes as the chemical 
factories of the future and reasoned that they could be 
programmed with DNA genes to produce any number 
of genetically engineered products with industrial, 
economic, and medical significance. 

During the 1970s, some of the promise of recom- 
binant DNA technology was fulfilled, and numerous bio- 
technology companies began applying the techniques 
of DNA science to manufacture useful products. By 
1980, one company had successfully harvested insulin 
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FIGURE 4.26 The impact of DNA technology and gene cloning. 
The two major objectives of recombinant DNA technology were to 
clone genes and accomplish the large-scale production of proteins. 


Plant 
resistance 


from bacteria that carry human insulin genes. Other 
research groups constructed recombinant plasmids in 
bacteria that could produce human interferon (a virus 
inhibitor) and human growth hormone, as well as many 
other products such as vaccines and enzymes. 

The 1980 and 1990s saw the development of many 
other products of DNA technology. Recombinant bac- 
teria were constructed that could dissolve oil spills, 
dispose of toxic waste, and dissolve clogs in drains. 
Recombinant bacteria were constructed to produce 
medically relevant proteins such as the human clot- 
dissolving enzyme, urokinase, and a kidney hormone, 
erythropoietin. A new form of forensic science called 
DNA fingerprinting won acceptance in the court sys- 
tem, gene therapy trials in humans began, and the first 
cloned mammal, Dolly, made her debut. Hundreds of 
companies worldwide were working on the industrial 
applications of DNA technology, and many scientists 
spoke optimistically of a future in which fertilizers 
would be obsolete, plants could use microbial toxins 
to drive off pests, and crops would be cultivated with- 
out the danger of frost damage. At the same time, new 
concerns surfaced about the safety of biotechnology 
research and development, especially with regard to 
the use of genetically modified organisms in food. 
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Today, of course, both optimism and concerns 
remain. Researchers have gone well beyond inserting 
foreign DNA into bacterial cells and DNA technologies 
have dramatically impacted many fields (Figure 4.26). 
For instance, DNA chips allow researchers to moni- 
tor the activation or suppression of thousands of genes 
simultaneously; ordinary people can trace genealo- 
gies using DNA; genetic tests can identify risks of some 
genetic diseases, allowing people to make informed 
choices about their future; sequencing of entire 
genomes has led to the birth of a new field, bioinfor- 
matics (see Chapter 7), and genome DNA sequencing is 
almost affordable for everyone. More than anything, the 
advent of recombinant DNA technologies gave humans 
the ability to exert control over the very molecules they 
are made of. The most optimistic scientists envision that 
humans will come to understand the human body and 
all living things. Hardly any facet of the human exist- 
ence will remain untouched by DNA technology, and 
the social and political questions raised will require 
solutions on a global scale. What if DNA technology 
leads to gene tampering and efforts to “improve” the 
human race? It is clear that ethical and moral dilem- 
mas will abound at every step of the genetic path along 
which we are all traveling. 


In addition to the exciting possibilities offered by recom- 
binant DNA science, new techniques and applications will 
always introduce some risk. Careful monitoring of DNA 
research is important to ensure public safety. 


Discussions of these and numerous other topics form 
the basis for the remainder of this book. As we will see in 
the pages ahead, DNA technology has entered the main- 
stream of human life and has become one of the most 
eloquent applications of scientific research. We con- 
clude the book with a chapter that discusses our current 
understanding of human genes and the human race. 


SUMMARY 


Genetic alterations using DNA technology permit sci- 
entists to intervene directly in the fate of living organ- 
isms. DNA technology has been made possible in part 
by knowledge about microorganisms that result from 
decades of studying bacteria. Interest in DNA technol- 
ogy was heightened considerably with the discovery of 
restriction enzymes because these enzymes cut a DNA 
molecule at a target sequence regardless of the source 
of the DNA. Moreover, some restriction enzymes such 
as EcoRI leave cohesive DNA ends that scientists use 
to unite the DNA fragments from different sources to 
construct recombinant DNA molecules. The DNA 
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FIGURE 4.27 Facets of DNA technology. (A) Viral vector production. While producing viral vectors for gene therapy, scientists are protected 
by the biocontainment insulator. (B) Agricultural biotechnology. Cloned sundew plants grew in a Petri dish on solid media that were started 
from the single cells of a parent plant (clones). (C) Biomedical biotechnology. Herceptin is a drug made by recombinant DNA technology that 
is designed to treat breast cancer. (D) Biomedical disease research. Transgenic mosquitoes can be used to fight malaria. These insect larvae 
carry a new gene that is expressed in their cells as visualized by the fluorescent green color. 


ligase enzyme forges permanent bonds in the DNA 
backbone at the junctions between the cohesive ends 
of the DNA molecules. 

Another key development was the use of plasmids 
as vectors to carry foreign DNA fragments. Plasmids 
are circular double-stranded DNA molecules that 
grow naturally in bacteria. In 1973, Stanley Cohen and 
Herbert Boyer inserted the DNA from a frog into a bac- 
terial plasmid and then grew the recombinant plasmids 
in host bacteria. The bacteria replicated the recom- 
binant plasmids and expressed the foreign frog protein 
in the cells, as well as the usual bacterial proteins. The 
era of recombinant DNA technology was launched. 

Safety concerns about recombinant DNA science 
brought scientists together at the Asilomar conference 
to discuss and establish guidelines for conducting 
future experiments in recombinant DNA technology. The 
modern Recombinant DNA Advisory Committee, the 
governing body for DNA technology, is a result of 
the Asilomar conference. As suggested in the 1970s, 
recombinant DNA technology has had profound con- 
sequences on basic research as well as on the practical 


applications of DNA technology. Pharmaceutical prod- 
ucts, agricultural advances, genetic testing, and foren- 
sic science are just a few fields that reflect the impact 
of DNA cloning technology as described in later chap- 
ters of this book. 


REVIEW 


This chapter concentrated on the scientific discoveries 
of the 1960s and 1970s that formed the basis on which 
DNA technology was developed. To test your knowl- 
edge of the chapter's contents, consider the following 
review questions: 


1. Describe some of the attributes of a bacterium and 
its DNA that make it a useful organism for study- 
ing DNA function. 

2. What are viruses, and how are they useful to DNA 
technology experiments? 

3. Explain the process of genetic recombination 
in bacteria and describe the relationship to 
recombinant DNA technology. 
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4. Discuss the sources and nomenclature of restric- 
tion enzymes, and explain how they are used in 
the laboratory to study DNA. 

5. What are ligases? What functions do they perform? 
How are they used in DNA technology? 

6. Describe some of the characteristics of plasmids, 
including their source, composition, method 
for insertion into bacteria, and value to DNA 
technology. 

7. Summarize the experiments performed by Boyer 
and Cohen that set the foundations for the 
development of DNA technology. 

8. Describe an example of how an experiment in 
DNA technology can be dangerous, and explain 
how scientists addressed safety in the 1970s. 

9. What are the functions of the Recombinant DNA 
Advisory Committee of the National Institutes of 
Health? 

10. List some of the possible uses of DNA technology 
for resolving human problems of a practical 
nature, and indicate some of the theoretical 
implications of DNA technology. 
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Scientists Swap Genes in Bacteria 


Associated Press, Updated: June 30, 2007 

By Lauran Neergaard 

Talk about identity theft: Scientists changed one species of 
bacteria into another by performing a complete gene swap. 

It’s a step in the quest to one day create artificial organ- 
isms, part of a bigger project to custom-design microbes 
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that could produce cleaner fuels. But the way it was per- 
formed, dubbed a “genome transplant,” has genetics 
specialists buzzing. 

“This is equivalent to changing a Macintosh compu- 
ter to a PC by inserting a new piece of software,” declared 
genome-mapping pioneer J. Craig Venter, senior author of 
the new research published Thursday by the journal Science. 

For years, scientists have moved single genes and even 
large chunks of DNA from one species to another. But 
Venter’s team transplanted an entire genome, all of an organ- 
ism’s genes, from one bacterium into another in one fell 
swoop. 


Scientists have been introducing foreign genes and 
fragments of DNA into bacteria to clone, study, and 
exploit for the good of humankind since the 1960s. 
This chapter outlines the strategies and methods that 
scientists use to achieve a desired end in the labora- 
tory. The experiment described in the preceding article 
represents a step well beyond routine lab methods. But 
just as transferring a fragment of DNA between organ- 
isms was once very unusual and exotic, methods for 
transferring whole genomes will likely become routine 
in the near future. 

To capture a genome and transfer it wholesale into 
a different organism, scientists at the J. Craig Venter 
Institute carefully extracted the entire DNA genome of 
an organism from closely related species of bacte- 
ria. They coaxed a tiny number of bacteria to pick up 
the new genome DNA using a chemical cocktail that 
encourages bacteria to merge their membranes, and a 
few rare bacteria lost their own genome in favor of the 
new genome. As yet, the genome transfer mechanism 
involved is not fully understood. 

This experiment represents a novel and important 
step along the path to making synthetic life. The long- 
term goal of synthetic life is to create living cells from 
scratch by synthesizing an entire genome that will dic- 
tate everything about the host organism, including where 
it can live, what it eats, what it produces, how it repro- 
duces; all its genetic characteristics will come from its 
“designer genome.” This approach might work to design 
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microbes that can gobble up excess carbon dioxide 
(CO3) in our atmosphere, or produce bio-fuels cheaply 
and efficiently, or churn out medicines by the ton (see 
Chapters 14 and 15). 

The next challenge will involve actually synthesizing 
an entirely artificial bacterial genome in the laboratory; 
even the simplest live organism will probably require a 
chromosome of more than 500,000 base pairs of DNA. 
This goal is a technological challenge, as to date the 
longest DNA molecule made in the lab is only about 
35,000 base pairs long. Venter’s team expects to synthe- 
size an entire genome, and then it will be a matter of 
getting the long synthetic DNA genome into the recipi- 
ent bacterium. At that point, Venter claims, he will have 
created the first synthetic life, taking the relatively new 
field of synthetic biology a fundamental step forward. 


LOOKING AHEAD 


This chapter describes the methods used by DNA 
technologists to clone and introduce genes and other 
DNA fragments into cells and stimulate those cells 
to produce the encoded proteins. On completing the 
chapter, you should be able to do the following: 


e List some criteria for selecting vectors to carry 
genes or DNA fragments and for selecting host cells 
to produce proteins. 

e Describe two laboratory methods used to introduce 
foreign DNA into cells. 

e Conceptualize the steps involved in gene expres- 
sion, understand the types of problems that can 
arise along this pathway, and explore how DNA 
technologists resolve those problems. 

e Explain the concept of a DNA library, and explain 
why in some cases it is necessary to use comple- 
mentary CDNA versus genome DNA libraries. 

e Discuss how DNA probes are used to screen for 
genes in a DNA or cDNA library. 

e Describe the process of polymerase chain reaction 
(PCR), its advantages, and potential limitations. 


INTRODUCTION 


In the real world of the research and development lab- 
oratory, DNA technology is a diverse blend of molecu- 
lar biology, genetics, chemistry, physics, mathematics, 
and high-tech biotechnology. Producing proteins and 
manufacturing molecules are among the most elegant 
endeavors of the cell and of biological science in the 
lab; both are widely used in applied and experimental 
contexts now, which will continue in the future. 

The science of recombinant DNA cloning allows 
scientists to use the natural biochemical processes of 
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cells to do the work of copying DNA and producing spe- 
cific proteins. The biochemistry behind DNA technology 
allows scientists to introduce new genes into cells. The 
cells then exhibit new biochemical activities as directed 
by the new gene that they carry. As the cells reproduce 
by cell division, they replicate the new gene along with 
their “original” chromosome DNA, producing millions 
of progeny cells containing millions of copies of the 
gene. In this way a gene is cloned and many identical 
copies are created. 

The extremely powerful method of polymerase 
chain reaction (PCR) is routinely used to clone genes 
and other specific regions of DNA sequence in vitro 
(outside the cell), without copying the entire DNA 
genome. PCR harnesses the routine activity of heat-stable 
DNA polymerase enzymes that normally replicate DNA 
in special species that live in high-temperature environ- 
ments. Some of the enzymes made by organisms grow- 
ing in hot environments are thermally stable and retain 
enzyme activity even in the high temperatures required 
for PCR. The enzyme technologies used to manipulate 
and study DNA and RNA are often based on the ability 
to understand the processes normally performed by the 
enzymes in the cells, then repeat the processes outside 
the cells using purified enzyme proteins and appropriate 
substrates. 


THE BIOCHEMISTRY OF RECOMBINANT 
GENE EXPRESSION 


Experiments involving cloning genes and expressing 
proteins require the use of host cells to receive the for- 
eign cloned gene. Some experiments use microorgan- 
isms or “microbes,” single-celled prokaryotes such as 
E. coli and Bacillus subtilis, and eukaryotes such as the 
budding yeast Saccharomyces cerevisiae (Figure 5.1). 
Microbes are often chosen as hosts for DNA cloning 
because they are relatively easy to grow in a laboratory, 
have been grown and studied extensively for decades, 
and have well-understood genetics that can be manip- 
ulated to make them appropriate hosts. Many types 
of cells can be converted into biochemical factories 
to produce various kinds of biomolecules. E. coli and 
B. subtilis are both commonly used as host cells for 
DNA cloning. Fortunately, humans have become very 
experienced at cultivating microbes cheaply and effi- 
ciently on large and small production scales. Over the 
centuries brewers and bakers have learned to employ 
yeast cells to manufacture beer, bread, and related food 
products (Figure 5.2). In terms of impact on human 
health, probably the most important products made 
by bacteria for human consumption are antibiotics. 
Since the 1940s, antibiotics have been mass-produced 
from microorganisms grown by the ton. 
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FIGURE 5.1 Some of the microbes important in DNA technology. (A) An electron micrograph of Escherichia coli, the common bacte- 
rium in the human and animal intestine. (B) Bacillus subtilis, a bacterium often isolated from the soil environment that is not a pathogen. 
(C) Saccharomyces cerevisiae, the nonpathogenic budding yeast commonly used in genetic research, alcohol fermentation, and baking. 
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FIGURE 5.2 (a) Uses of S. cerevisiae through the ages. Humans have used brewer's yeast (S. cerevisiae) for millennia to make beer and ale, 
and still do. (A) Brewers of the 1500s. Engraving is from the sixteenth century, by J. Amman. (B) A modern, personal-size brewing machine. 


What Is DNA Cloning? 
Recombinant DNA technologies use enzymes that cleave or ; ; ; ; ; 
copy DNA in living cells. The purified enzymes can perform A clone is an identical copy. At its most basic level, 
DNA manipulations in vitro in recombinant DNA experiments. cloning DNA involves “cutting and pasting” DNA from 
its original genome into a convenient carrier DNA 
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molecule called a vector. The vector molecule is then 
introduced into a suitable host cell. Reproduction of 
the host cells includes replication of DNA, thereby 
creating identical copies of the inserted DNA along 
with the organism’s DNA. The details for accomplish- 
ing the goals of any DNA cloning procedure depend 
on the characteristics of the DNA of interest and the 
conditions required for transferring the cloned DNA of 
interest into the host cell or organism. 

Several important factors contribute to the success of 
a DNA cloning experiment. The target DNA coding for 
the gene of interest is isolated from the cells. Scientists 
often choose the vector and host cells together since 
they work together based on the specific requirements 
of the experiment. For example, if the plasmid vector 
encodes a tetracycline (antibiotic) resistance gene, it is 
necessary to use a recipient host strain of E. coli that is 
tetracycline sensitive so that the bacterial cells cannot 
grow in the presence of tetracycline (see Chapter 4). 
Once a foreign gene is introduced into a host cell on a 
vector, the expression of the foreign gene is turned on 
and transcribed into mRNA. The mRNAs are translated 
into protein by ribosomes in the cell. 


Cloning genes is a process where fragments of DNA are 
transferred from one organism to another, usually carried 
on a DNA vector. Microbes like bacteria are convenient 
carriers and hosts for cloning DNA. 


Vectors Are DNA Carrier Molecules 


Vectors are DNA molecules into which fragments of 
DNA inserted, cloned, and, in many cases, expressed 
as RNA and protein. The vector carries the foreign 
DNA attached to the vector DNA. The host cells must 
support replication of the vector DNA. Replication is 
initiated and controlled by the replication origin DNA 
present on the vector. Vectors with appropriate DNA 
elements can replicate and produce large quantities of 
the gene (and the vector), and can be easily isolated for 
DNA sequencing and other gene studies. A promoter, 
or transcriptional control DNA region, carried on the 
vector can be designed to promote high levels of RNA 
expression from the cloned foreign gene, resulting 
in large amounts of the cloned protein. Because the 
DNA sequences required to initiate DNA replication 
(and transcribe and translate gene products) are differ- 
ent in prokaryotic and eukaryotic cells, it is important 
that the necessary DNA elements carried on the vector 
match the functions required by the host cells. 

In the early days of DNA research, the choice of vec- 
tors was limited to naturally occurring DNA molecules 
that carry a few genes. For instance, plasmid vectors 
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permit the host cells to survive in the presence of anti- 
biotics because the plasmids carry antibiotic resistance 
genes. By now there are many hundreds of modern 
vectors available in a variety of sizes, containing many 
different DNA elements used to control replication 
and transcription functions in both prokaryotic and 
eukaryotic host cells. Modern vectors offer every con- 
ceivable combination of selection genes, replication 
origins, cloning sites, and coding regions, all tailored 
to function in a large range of host cells. Most types 
of vectors are either commercially available or can be 
obtained on request from the scientist that constructed 
the vector. 

Cloning vectors contain special replication origins 
that permit the vector DNA to replicate and produce 
many copies of the vector inside growing bacterial 
cells. The replication origin that enables the vector 
to replicate also influences the vector’s host range. If 
the replication control elements in the vector do not 
match the type of DNA replication machinery used in 
the host cells, the vector cannot replicate and will be 
diluted out of the population as the cells grow. 


DNA vectors can carry a fragment of DNA from one organ- 
ism to another and are specifically designed to perform the 
critical functions in the host cells such as replication of the 
vector and expression of the gene carried by the vector. 


Expression Vectors Can Produce Foreign 
Proteins 


Expression vectors contain different types of promoter 
elements that regulate the amount and level of RNA 
transcription of a gene carried on the vector. Promoters 
are often controlled by the cell to regulate RNA expres- 
sion, which in turn controls the amount of the protein 
produced. Expression vectors are designed to carry a 
cloned gene into cells and then produce the encoded 
protein. Promoter DNA elements vary between organ- 
isms and have different DNA sequences in prokaryotes 
and eukaryotes. Promoter information is important to 
consider when choosing a vector. Some vectors also 
include elements that function to enhance the stability 
of the mRNA and increase protein expression levels. 
Vectors are often supplied from a commercial ven- 
dor complete with all of the appropriate gene expres- 
sion functions needed. The researcher can further 
modify the vector using standard DNA cloning meth- 
ods to insert the desired DNA fragment, followed by 
transformation into bacteria. The efficiency of bacte- 
rial transformation (the number of transformed cells 
per amount of incoming DNA) depends on the ability 
of the treated bacterial cells to take up DNA from the 
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environment, which is affected by many experimental 
factors. Scientists often obtain vectors from colleagues 
doing related research, which is especially helpful in 
cases where vectors are designed for specific research 
purposes. Scientists can also obtain vectors and host 
organisms from sources such as the American Type 
Culture Collection (ATCC), a unique private, nonprofit 
resource dedicated to the collection, preservation, and 
distribution of scientifically relevant microorganisms, 
DNA libraries, and mammalian tissue culture cell lines. 


The choice of vector is determined by the goals of the 
experiment and by the characteristics of the host cells that 
will carry the vector and express the foreign DNA. 


Plasmid Vectors 


Plasmids are circular, double-stranded DNA molecules 
that replicate independently from the host chromosome 
(Figure 5.3; see Chapter 4). In nature, plasmids play a 
central role in transferring antibiotic resistance from one 
bacterium to another by exchanging plasmids between 
the two bacteria. Plasmids commonly serve as the start- 
ing DNA vector used to construct more complex types 
of DNA vectors. Plasmids are ideal to work with in the 
lab because they are small, circular double-stranded 
DNA molecules that are not easily broken by physical 
manipulation or by cycles of freeze-thawing during stor- 
age in the lab. In contrast, vectors made from long, lin- 
ear DNA molecules (such as the lambda phage genome) 
are easily broken by physical manipulation during 
experiments. Small plasmids are taken up more effi- 
ciently by bacterial host cells than are larger plasmids, 
and small plasmids are less likely to be damaged during 
purification and handling. 

The maximum length of foreign DNA that can 
be carried by a vector without interfering with vector 
replication is important to consider. Plasmid vectors 
can often carry DNA fragments ranging from hundreds 
of base pairs to much longer fragments of many thou- 
sands of base pairs. A unit of 1000 DNA base pairs 
is called a kilobase pair (kb). In the 1970s, Stanley 
Cohen and his research coworkers discovered that 
plasmid DNA circles could be cut at specific DNA 
sites by restriction enzymes and that certain DNA frag- 
ments could be connected together to form recom- 
binant DNA plasmids, or chimeras (Chapter 4). In the 
early days of DNA cloning few plasmids were avail- 
able for experiments in molecular biology. At the time, 
pBR322, a circular, double-stranded DNA plasmid 
was commonly used in DNA cloning experiments. 
The entire DNA sequence of pBR322 is known, which 
means that all of the restriction enzyme recognition 
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FIGURE 5.3 Circular DNA plasmids derived from £. coli. Plasmids 
are circular double-stranded DNA molecules that replicate separately 
from the main chromosome. This electron micrograph shows many 
copies of a plasmid commonly used for recombinant DNA cloning 
work, pBR322, which is approximately 4300 base pairs (4.3 kb) in 
length. Many vectors in use today are derivatives of pBR322. 


sites are known. A simple restriction enzyme map of 
the pBR322 plasmid shows that inserting a DNA frag- 
ment into pBR322 can change the function of one of 
the antibiotic resistance genes encoded by the plasmid 
(Figure 5.4). 

The strategy to insert a DNA fragment into the 
BamHI site in the pBR322 plasmid begins by cutting 
the pBR322 plasmid DNA and the foreign DNA to be 
cloned with the same restriction enzyme, in this case, 
BamHI. This restriction enzyme cuts double-stranded 
DNA and leaves complementary overhanging ends 
(sticky ends) on the DNA molecules. The BamHI DNA 
sticky ends base pair together, temporarily connecting 
the pBR322 vector DNA to the foreign DNA fragment 
through weak hydrogen bonds between the com- 
plementary bases (A with T and C with G). To make the 
union permanent, the DNA ligase enzyme joins the 
sugar-phosphate backbones of the plasmid DNA to 
the backbone of the inserted DNA fragment, closing 
the circular recombinant DNA plasmid. 

The “parental” pBR322 plasmid DNA encodes anti- 
biotic resistance genes for ampicillin (ApR) and tetra- 
cycline (TcR) (see Figure 5.4). Cells carrying the parent 
pBR322 plasmids can grow in medium containing both 
antibiotics. The BamHI enzyme recognizes only one 
recognition site in the pBR322 plasmid DNA, which is 
located within the gene encoding resistance to tetra- 
cycline. The location of the BamHI site is a key feature 
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FIGURE 5.4 pBR322 plasmid DNA is used as a vector to clone a foreign DNA fragment. (A) pBR322 (ApR, TcR) is shown with a very sim- 
ple restriction enzyme map indicating the sites where different restriction enzymes can cut the pBR322 DNA. The ampicillin and tetracycline 
(antibiotic) resistance genes are indicated in red (ApR) and blue (TcR), respectively. The positions of the DNA replication origin and BamHI 
cloning site inside the TcR gene are noted on the plasmid map. (B) This pBR322-1 map shows the site where the foreign DNA fragment (indi- 
cated as a triangle) was inserted into the BamHI site to create the new pBR322-1 vector. The DNA insertion into the BamHI site of pBR322 
inactivates the tetracycline resistance gene (TcS). Bacteria cells carrying the new pBR322-1 vector are sensitive to tetracycline because the 
inserted DNA disrupted the tetracycline resistance gene, but remains resistant to ampicillin. 


in this cloning approach, because inserting a DNA 
fragment into the BamHI site will disrupt the tetra- 
cycline resistance gene on the plasmid. Cloning into 
the BamHI site offers scientists a way to select for and 
detect the rare cells that have picked up a plasmid 
DNA molecule. The cells that pick up the pBR322-1 
DNA (the vector containing the inserted DNA) are 
genetically different; they can grow in the presence of 
ampicillin, but they cannot grow in tetracycline. 


Transferring DNA to Bacterial Cells 
in the Lab 


The genetic alteration of a cell resulting from the 
uptake of foreign DNA is called transformation. In 
a typical transformation experiment in the lab, the 
actively growing bacteria are collected by centrifuga- 
tion, and then suspended in ice-cold calcium buffer. 
Bacterial cells treated in this way are said to be “com- 
petent” for transformation with DNA. The recombinant 
vector carrying the foreign DNA fragment (for example, 
pBR322-1) is added to the competent cells, and the tube 
is subjected to a sudden increase in temperature, 
called a heat shock (see Chapter 4). Under these con- 
ditions, a very small number of bacterial cells take up 
the plasmid vector DNA. The low efficiency of plas- 
mid uptake is partly offset by the rapid division rates of 
the transformed bacteria carrying the plasmid vectors. 
These cells survive an appropriate antibiotic selection 
screen because they carry the recombinant plasmid 
DNA containing an intact antibiotic resistance gene. 
To be able to select for vectors in different types of 
host cells, vectors carry at least one selectable marker 


gene. Selectable genes allow the scientists to detect 
which cells have taken up a plasmid and which cells 
have not. When using antibiotics for selection, cells 
that survive the DNA transformation procedure but do 
not receive a plasmid cannot grow because they do not 
have the antibiotic resistance gene, and will succumb 
to the actions of the antibiotic. In this way, the scientists 
can select for the growth of antibiotic resistant cells, 
those that harbor the plasmid DNA vector. 


Selectable marker genes allow detection of the rare bacterial 
cells that pick up the target vector DNA and become trans- 
formed. When antibiotics are used for selection, only the 
transformed cells carrying the vector are antibiotic resistant 
and can grow in the presence of the antibiotic. 


Viral Vectors 


Plasmids are great vectors used in bacterial cells for 
DNA cloning purposes and for some types of protein 
expression studies. But plasmids have some limitations 
in comparison to other vector options, including viral 
genomes. Many different kinds of viral genomes have 
been engineered for use as vectors to carry foreign DNA; 
several of these are described in later chapters cover- 
ing gene therapy and genetic diseases (see Chapters 10 
and 11). Viral vectors can carry long fragments of DNA 
(more than 20,000 base pairs, or 20 kb) but the fragment 
length is limited by the genome packaging capacity of 
the virus protein coat or capsid. Viruses use efficient 
mechanisms to infect bacterial cells. When artificial 
viruses transfer a recombinant DNA vector genome into 
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FIGURE 5.5 Bacteriophage lambda (A) preys on E. coli cells. The X phage uses naturally occurring receptor proteins on the surface of the 
bacteria as landing sites to invade the cell. (A) An artist’s drawing of two \ phages attached to the surface of a bacterium. The virus on the 
right has just injected its viral DNA genome into the cell (arrow). (B) Newly made phages are bursting out of a ruptured E. coli cell. (C) An 
electron micrograph showing one double-stranded DNA genome from a single A bacteriophage. 


cells (in place of the virus genome), the transformation 
process is called transfection. 

Different types of viruses enter cells and launch infec- 
tions using different mechanisms. Many viruses attach 
to protein receptors on the cell surface, where they are 
taken into the cell by endocytosis. Other viruses actively 
inject the viral genomes directly through the membrane 
and into the cell. Once inside the cell, the viral genome 
usually follows one of two major pathways inside the 
cell: (1) The lytic pathway, in which the virus comman- 
deers the host cell transcription and translation machin- 
ery to make viral proteins and copy the viral genome. 
The host cell processes are suspended by the virus and 
each genome is packaged individually into a viral cap- 
sid and released as a virus from the dying cell, which 
further spreads the virus infection. (2) The lysogenic 
pathway, where the viral genome DNA integrates 
directly into the host chromosome DNA where the virus 
genome can remain indefinitely as a provirus. 

The lambda phage genome was used to gener- 
ate many types of bacteria vectors. Lambda (A) is a 
well-studied bacteriophage that infects F. coli cells 
(Figure 5.5). Phage A latches onto receptor proteins that 
serve as docking stations on the outside of the bacterial 
cell. The viral proteins build a channel through the cell 
membrane and the phage \ DNA genome is injected into 
the bacterial cell through the membrane channel. Once 


inside the cell, the viral genome takes over the protein- 
making machinery and uses host cell enzymes to make 
the viral gene proteins and eventually more viruses. 

To maximize the efficiency of gene transfer into bac- 
terial cells, scientists in the 1960s mimicked the nor- 
mal mechanisms used to package the \ genome into 
a capsid in vivo, and used the same approach to pack- 
age and deliver lambda vectors. The scientists isolated 
the viral proteins from infected bacteria and created a 
test tube mixture of cellular components that works as 
an in vitro virus packaging extract. The in vitro packag- 
ing extract rapidly packages the lambda vector DNA, 
along with whatever foreign DNA has been inserted, 
into virus particles that will efficiently transfer the DNA 
from phage to bacteria. 

The vector DNA is applied to a plate of E. coli bac- 
teria in a dilute solution so that the recombinant phages 
infect only nearby cells. Each infected cell releases 
new phages that infect nearby cells. Gradually a single 
lambda phage makes a plaque, a clear circle in a lawn 
of bacterial cells where the infected cells have died and 
the virus has infected the surrounding cells (Figure 5.6). 

Methods offering high efficiency transfection are 
particularly important for studies where the odds for 
successful cloning are low, for example, when manip- 
ulating large mammalian genomes and genome frag- 
ments. Success or failure in these cases can depend 
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FIGURE 5.6 Phages form viral plaques on a plate containing a layer (lawn) of bacteria. (A) Healthy bacteria are grown in a continuous layer 
or lawn across the solid agar in a Petri dish to use as a substrate for phage production. The phage stock is diluted in a solution, and applied to 
the plate so that a single phage infects a single bacterial cell. The new phage are released from the dying cell and infect nearby bacteria in the 
lawn. The death of a small neighborhood of infected bacterial cells causes a clear circle called a plaque that appear in the cloudy background 
lawn of uninfected bacterial cells. (B) This plate contains viral plaques that appear as cloudy circles on the lawn of bacteria across the plate. 


entirely on the efficiency of DNA transfer attained in 
the experiment. 


Viral vectors and packaging extracts offer scientists the 
option of packaging recombinant DNA into virus particles 
for efficient delivery into the bacterial host cells, a process 
called transfection. 


Cosmid Vectors 


The phage \ genome has the form of a double-stranded 
linear DNA molecule during most of the phage life 
cycle, but at some point the linear \ genome temporar- 
ily converts to a circular form. Surprisingly, the phage 
A genome circularizes using the complementary sticky 
ends (cohesive end sites, or cos) on the viral genome 
DNA (Figure 5.7A). The single-stranded DNA bases on 
each end of the \ genome resemble the sticky ends left 
by restriction enzyme cleavage, but the cos ends are 
much longer. Because the cos cohesive ends are long, 
when the cos ends base pair together they make a sta- 
ble (though noncovalent) connection and form a stable 
double-stranded circle until the ligation reactions join 
the DNA backbones. In this way the A cos sites convert 
the double-stranded, linear genome into a stable circu- 
lar genome. Scientists used the X cos DNA sequences 
to create cosmid vectors. 

A cosmid is a small double-stranded plasmid that 
has been engineered to contain the cohesive end sites 
(the “cos” of cosmid) that naturally occur on the lambda 
phage genome. The recombinant cos sequences in the 


cosmid vector have the same function as they do in the 
phage; the cos sites convert the double-stranded linear 
DNA into a very large circular DNA molecule. 

During transformation, bacteria do not take up large 
DNA plasmids very efficiently; in addition, large plas- 
mids are unstable and are broken easily by manipu- 
lation. Cosmid vectors combine the advantages of 
plasmids and viruses. Cosmids can be inserted into bac- 
terial cells using in vitro packaging with high efficiency, 
but also enables the vector to replicate as a plasmid, 
independent of the genome (Figure 5.7B). Cosmid vec- 
tors can be used to carry DNA fragments that are much 
longer (around 40kb) than those typically carried by 
virus or phage vectors. 


Cosmid vectors enable us to take advantage of the helpful 
features of both plasmids and viral vectors. Cosmid vectors 
can carry very long DNA fragments, enter the cells with 
high efficiency, and replicate as independent plasmids in 
the cells. 


Artificial Chromosome Vectors 


A different approach to vector design began with the 
use of artificial chromosomes in the 1990s. Artificial 
chromosomes mimic native chromosomes and must 
include all of the DNA control elements necessary for 
the vector to behave just like a natural chromosome. 
This includes the appropriate DNA replication ori- 
gin for host cells, where DNA replication is initiated 
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FIGURE 5.7 The lambda genome and the cosmid vectors both use “sticky” cos ends. The linear lambda genome forms a circle via its “sticky” 
cos ends. (A) Inside the infected cell the lambda phage DNA genome exists either as a linear double-stranded molecule (top) or as a circular 
molecule (bottom). The linear double-stranded DNA genome contains single-stranded sticky ends, cosL and cosR, on the left and right ends of 
the genome, respectively. The cos single-stranded DNA ends are complementary; the single-stranded ends can base-pair with each other. When 
it is the appropriate time in the lambda life cycle, the cos DNA ends base pair to each other and the viral genome forms a double-stranded circle 
(bottom). (B) Cosmid vectors have the advantages that they can be packaged like lambda and they can form DNA circles resembling plasmids 
when grown in bacteria in the lab. Cloning into cos vectors involves these steps: (1) The lambda DNA is cut once with a restriction enzyme that 
leaves short cohesive ends at the DNA cut site. (2) Each phage genome is cut into two DNA fragments; each fragment contains short cohesive 
ends left by the restriction enzyme on one end and the long single-stranded cos ends from the lambda genome on the other end. (3) A specific 
DNA fragment or a cDNA is ligated to the short cohesive ends of both DNA fragments to create a recombinant molecule called a cosmid. (4) The 
cosmid DNA vector and its cargo remain linear until it is inside the bacterium. (5) Once inside the cos ends of the linear DNA base pair to each 


other and convert the linear cosmid DNA into a double-stranded DNA circle. 


and controlled, as well as functional centromere and 
telomere DNA sequences. Centromere DNA serves to 
attach the chromosome to the mitotic spindles during 
cell division, and its function is essential for correct 
chromosome segregation during mitosis. Without func- 
tional centromere DNA the offspring cells can inherit 
an unbalanced number of chromosomes, a condition 
that is usually lethal. The telomere ends of eukaryotic 
chromosomes play a key role in the specialized DNA 
replication mechanism that is necessary when cells rep- 
licate the ends of the chromosomes; telomeres protect 
the chromosome ends from becoming shortened during 
each cell cycle replication event (see Chapter 9). 


Artificial chromosomes carry long DNA fragments and 
contain all the DNA elements needed to ensure that they 
will replicate only during the DNA synthesis phase of the 
cell division cycle, along with the native chromosomes. 


The first artificial chromosome was created 
from an unusually large circular bacterial plasmid 
called F’ (F prime) whose natural function is to allow 
a bacterium to transfer its chromosome to another 
bacterium. The F’ plasmid was adapted for use as a 
bacterial artificial chromosome (BAC) vector, which 
can carry up to 350kb (350,000 base pairs) of addi- 
tional DNA. 

The yeast artificial chromosome (YAC) was the 
first linear eukaryotic artificial chromosome ever 
made and was designed to carry DNA in budding 
yeast cells. Artificial chromosome vectors have now 
been developed that function in other model organ- 
isms as well, such as zebra fish and mouse. The uncut 
YAC vector is a circle of double-stranded DNA that 
becomes a linear chromosome in the yeast cells 
(Figure 5.8). YAC vectors have both bacterial and 
yeast replication origin sequences, enabling them to 
perform as shuttle vectors that move DNA between a 
prokaryote and a eukaryote. 
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FIGURE 5.8 Yeast artificial chromosome (YAC) vector. In this case 
yeast refers to the budding yeast Sacharromyces cerevisiae. YAC vec- 
tors are typically built using DNA elements derived from S. cerevisiae 
chromosomes that are necessary for a chromosome to behave prop- 
erly inside the budding yeast nucleus. The circular YAC molecule con- 
verts into a linear chromosome in yeast cells, which replicates and 
segregates when the yeast cells divide, just like a native linear chro- 
mosome. To function like a native chromosome, YAC vectors usually 
include: TEL: Telomere sequences are authentic yeast telomere DNA 
engineered into the circular YAC vector so that the vector can function 
as a linear chromosome inside the yeast cells. CEN: Centromere DNA 
(CEN4) is derived from authentic yeast centromeres and is required 
to ensure proper chromosome segregation (distribution) during mito- 
sis (cell division) just as if the YAC were a native chromosome. ORI: 
Origins (ori) are required for the YAC DNA to replicate normally when 
the yeast cells divide. Selectable genes and/or markers: URA3, SUP4, 
and TRP1 are yeast genes that are used to select for the YAC in yeast 
cells. Genes are also included to select for YACs in bacterial cells. 


Preparing a YAC requires the handling, separation, 
and purification of extremely long DNA molecules, 
including the yeast chromosomes, which range from 
225kb to nearly 2000kb (2 million base pairs) in length. 
This can be accomplished using a special type of gel 
electrophoresis that permits very long DNA molecules 
to be resolved. Other species-specific artificial chromo- 
somes have been developed, including human artificial 
chromosomes (HACs), which hold particular promise for 
applications in gene therapy (see Chapter 11). Usually 
when DNA is added to human cells it tends to integrate 
randomly into the cell genome, potentially disrupting 
important genes, or alternatively the foreign gene might 
be inserted into a transcriptionally silent region of the 
genome that is not expressed. These disadvantages are 
avoided when the gene is present on an artificial chro- 
mosome, because it does not integrate into the existing 
DNA and it replicates independent of the main genome. 
Additional advantages are that HACs offer a way to study 
aspects of chromosome function and to explore the 98% 
of human DNA that does not encode gene products and 
whose functions are not yet well understood. 
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Artificial chromosomes can carry large DNA fragments 
and replicate independently along with the native chromo- 
somes, by virtue of DNA sequences that are necessary for 
centromere, telomere, and replication functions. 


Choosing Host Cells for Cloning 


Choosing a suitable host cell and vector combina- 
tion depends on the specifics of the planned cloning 
experiment. If the goal is to clone a gene to make a 
lot of DNA for further studies, then bacteria are appro- 
priate host cells. Bacterial cells are easily transformed 
by plasmid DNA, and once inside the cell, the plas- 
mid produces many copies of the recombinant plas- 
mid DNA. This is one way to make a large quantity of 
plasmid vector DNA that is easily isolated from the 
bacterial cells. 

Because vectors carry DNA control elements that 
must function in the host cells, the vector and host 
cells are often considered together. Vectors carry DNA 
elements that function as replication origins, which are 
sites on the genome where DNA replication begins. 
Although the fundamental mechanism of DNA replica- 
tion is the same in prokaryotes and eukaryotes, the ori- 
gin sequences involved in initiating DNA replication 
are very different. So a plasmid outfitted for replication 
in E. coli, for example, would not be able to replicate 
in yeast or mammalian cells. 

Host cells are easily cultivated in the laboratory, 
and the cells carry the genes and control elements as 
appropriate to accommodate the other requirements 
of the experiment. In addition, scientists must have a 
genetic selection or screen that will permit detection 
of the transformed cells containing the recombinant 
vectors. 

E. coli was used in the earliest DNA technology 
experiments, and by the early 1950s the genetics of 
E. coli cells were well established and played a key role 
in the genetic conjugation experiments in the 1960s. In 
addition, E. coli bacteria were used to solve the puzzles 
surrounding the process of protein synthesis in experi- 
ments performed during the 1960s and 1970s. E. coli 
bacteria continue to be the major type of host cell used 
for DNA cloning and protein expression experiments 
and are still widely used in part because bacterial cells 
divide rapidly. Under ideal growth conditions in the 
wild, a bacterial cell can reproduce and make two cells 
every 20 minutes or so, compared to 90 minutes to two 
hours for wildtype budding yeast cells and 24 hours or 
more for mammalian cells. An alternative prokaryotic 
organism host is the bacterium Bacillus subtilis (see 
Figure 5.1). This rod-shaped, nonpathogenic bacterium 
was first used in 1958 and since then, its genetics have 
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Box 5.1 The E. coli Family Tree 


The Good, the Bad, and the Ugly 


The bacterial species F. coli is actually a family of related 
groups of bacteria called bacterial strains. Many strains of 
E. coli are beneficial to humans. One of the most common 
strains of E. coli lives peaceably and beneficially in the human 
intestines where it helps with food digestion and represses 
harmful bacteria. The strain of E. coli so widely used in the 
laboratory is harmless to humans and is an excellent host for 
recombinant DNA vectors. 

Other strains of E. coli have attracted attention because of 
their devastating and sometimes gruesome effects on human 
health. One is an extremely rare strain that rapidly destroys 
soft tissue, known as “flesh-eating” bacteria. But this rather 
shocking term is not correct—the bacteria do not eat flesh, 
but they do secrete toxins that kill skin cells and other tissues, 
rapidly spreading the infection throughout the body. The flesh- 
killing E. coli is a rare strain; there are other bacterial species 
that dominate this grisly category. 

Another infamous £. coli strain, O157:H7 (the numbers 
and letters of strains refer to antibody reactions used to iden- 
tify them), lives in the intestines of about half of all cattle used 
for meat products. E. coli 0157:H7 is harmless to cows, but 
not to people, although the meat is harmless if processed and 
cooked correctly. Infections caused by the E. coli 0157:H7 are 
often deadly in children, the elderly, and immune-compro- 
mised individuals. Cases of human illness have been traced to 
contamination of the public water supply with E. coli O157: 
H7, usually as a result of careless methods of animal slaughter 


been thoroughly studied. The naturally occurring plas- 
mids in B. subtilis have been scrutinized in detail, and 
the bacteriophage viruses that attack it have been well 
studied, making B. subtilis an attractive cloning host. 
Strains of B. subtilis actively export proteins out of the 
cell by secretion, and scientists have used vector DNA 
sequences to direct the cloned foreign proteins for 
secretion, a useful alternative to £. coli for many pro- 
tein expression studies. As a result, B. subtilis has been 
used to produce antibiotics, insecticides, and industrial 
enzymes, among other products. 


Vectors and host cells are often chosen together, since the 
vector must replicate inside the host cells. Prokaryotic cells 
like E. coli and B. subtilis divide quickly and are easily grown 
in the lab, making them commonly used host cells for DNA 
cloning experiments. 


Although E. coli has been the prokaryotic workhorse 
of molecular genetics for many years, bacteria have 
some limitations as expression systems for eukaryotic 
proteins. Indeed, certain experiments in DNA technology 
require eukaryotic cells, or, in some cases, mammalian 


and poor waste management. Outbreaks of F. coli O157:H7 
have been linked to drinking water sources contaminated by 
runoff water from a cattle farm. 

What makes the E. coli O157 strain deadly for people? 
The E. coli 0157:H7 cells secrete the Shiga toxin, which kills 
cells while spreading throughout the body, ultimately causing 
organs to shut down. As yet there is no effective antidote or 
cure for E. coli 0157:H7, but with appropriate medical care, 
most healthy people recover from an infection in 5 to 10 days. 

How did some ancestor of the well-trusted E. coli bacte- 
ria living in our intestines acquire the toxin gene necessary to 
make such a devastating pathogen? During a bacteriophage 
infection the phage genome DNA, including the Shiga toxin 
gene, are integrated into the bacterial chromosome, establish- 
ing a lifetime provirus copy of the phage DNA and the toxin 
gene in E. coli O157:H7. 

Ironically, the introduction of low doses of antibiotics into 
cattle feed in the 1950s might have activated the dormant 
provirus copies lurking in the E. coli O157:H7 chromosomes 
and triggered toxin production. The 1950s also saw the first 
cases of an infection in children, which we now know was 
caused by a pathogen related to the £E. coli 0157:H7 bacteria. 

Many restaurant menus now issue a warning that your 
steak or burger is only guaranteed to be safe when cooked 
medium-well. This is because one way to kill O157:H7 is to 
cook the beef to 160°F throughout. Research is under way to 
find a vaccine that would protect cattle from the E. coli 0157: 
H7 bacteria. 


cells to be successful. All proteins must fold properly 
into specific three-dimensional shapes in order to func- 
tion in the cell. Eukaryotic proteins often fold incorrectly 
when expressed in prokaryotic cells, resulting in biologi- 
cally inactive proteins. Furthermore, bacteria lack the 
enzymes needed for the posttranslational modifications 
of proteins that normally take place in eukaryotic cells. 
These modifications are chemical alterations such as the 
permanent attachment of phosphates, sugars, carbohy- 
drates, lipids, and other types of small molecules to the 
newly made proteins. These protein modifications do 
not occur in prokaryotic cells. 


Many eukaryotic proteins fold incorrectly or fail to be 
modified correctly when expressed in bacteria, and there- 
fore expression of cloned eukaryotic proteins often requires 
a eukaryotic host. 


Scientists study the expression of eukaryotic genes 
in various host cells or organisms such as mouse, yeast 
(S. cerevisiae), fruit fly (Drosophila melanogaster), and 
roundworm (C. elegans) (Figure 5.9). Gene expression 
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FIGURE 5.9 Model organisms are often used to study gene expression and development. (A) Zebra fish—adult. (B) Transparent zebra fish. This 
adult zebra fish was engineered to be transparent to permit scientists to have a better view inside the functioning fish. (C) Scientists can observe 
embryo development directly in this transparent zebra fish embryo. (D) Drosophila (fruit fly) is a very important model system for studying genes 
and development. (E) Image of Drosophila embryos just after fertilization shows the dynamics of morphogen molecules that communicate posi- 
tional information to individual nuclei in the embryo. The nuclei were fluorescently labeled to show the Bicoid protein (blue), the Hunchback 
protein (green), and DNA (red). (F) The red dots in this Drosophila embryo image indicate the positions of specific RNA:protein complexes 
(U7snRNPs) (stained red), which are involved in mRNA metabolism during embryo development. (G) Early Drosophila embryo shows that the 
segmentation genes coding for the Ftz protein (brown) and the Odd RNA (blue) protein are expressed in a series of alternating stripes that appear 
fuzzy because this embryo has a mutation that causes the expression patterns to overlap. (H) Confocal microscopic images of Drosophila embryos 
with DNA (white), with different genes expressed shown in red and green. The tilted embryo (top right of panel) reveals the interior of the embryo. 
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FIGURE 5.9 (Continued) (I) This Drosophila embryo shows the expression of gene regulators during early embryo development. Each color 
represents the expression of one protein: Knirps (green), Kruppel (blue), Giant (red). The darker areas of the embryo contain cells that do not 
express these genes and the yellowish areas are regions of cells that express both Knirps and Giant. (J) This C. elegans roundworm carries a 
GFP gene that expresses the Green Fluorescent Protein in certain cells near one end of the roundworm. (K) This mouse embryo contains a 
reporter gene that makes beta-galactosidase and a blue color in the cells that normally express the mouse noggin gene as revealed by the 
blue color in the brain and in the skeleton. (L) An adult lab mouse at the National Institutes of Health (NIH). 


is also studied in human tissues and cell lines. For each 
type of organism or cell, specific experimental pro- 
tocols are available to introduce vector DNA into the 
host, ranging from calcium and heat shock in bacteria 
to injecting plasmid DNA into some large mammalian 
cells and amphibian egg cells, to creating transgenic 
animals and plants (see Chapter 14 and Chapter 15). 
The characteristics of the vectors used for protein 
expression are considered later in this chapter. 


DNA LIBRARIES STORE CLONED DNA 
SEQUENCES 


Scientists who work with genes would face an almost 
impossible task if genes or other DNA sequences 
being studied had to be reisolated from the original 
cells or tissues every time anyone wanted to perform 
an experiment on them. For this and other gene clon- 
ing experiments, DNA libraries were developed to 
allow scientists a convenient way to store and cata- 
log DNA fragments (see Chapter 7). Libraries make it 
easier to identify DNA fragments and to retrieve a 


cloned copy of the gene from the collection. A DNA 
library is a collection of host cells containing vectors 
with cloned DNA fragments that represent all of the 
DNA sequences in question. There are two main types 
of DNA libraries: the genomic DNA library and the 
complementary DNA (cDNA) library. 


A Genomic Library Contains All the 
Sequences in a Genome 


A genomic DNA library contains all of the DNA frag- 
ments from the entire genome of an organism. Thus, in 
addition to the protein-coding genes, a genomic library 
also includes the 98% of the human genome that con- 
tains non-protein-coding DNA sequences such as pro- 
moters, introns, and highly repeated DNA sequences. 
In contrast, cDNA libraries contain only DNA that has 
been transcribed into RNA from the genome of interest, 
in the cell type used to generate the library. The bacte- 
ria that carry this type of DNA library contain cloned 
cDNA copies of mRNA sequences that were expressed 
and translated into the proteins in the cells used to make 
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the library. A cDNA library is always much smaller than 
a genomic library from the same organism, because 
only the genes that were being actively expressed in the 
cells are included in the cDNA library. Because eukary- 
otic genomes contain much more noncoding DNA than 
prokaryotic genomes, a cDNA library is essential for 
isolating genes from eukaryotes. 

To begin the process of making a genomic library, 
the genome DNA is isolated from the organism of inter- 
est and is cut into fragments by a restriction enzyme 
or by a combination of restriction enzymes to gener- 
ate the desired range of DNA fragments for cloning. 
The fragment size range desired depends on the size 
of the genome and the intended vectors for the library. 
For example, if the target genome is to be cloned in 
a bacterial vector and host (Figure 5.10), the DNA is 
cut into fragments of about 10 to 15kb, and then the 
vector and the DNA fragments are ligated together and 
used to transform E. coli cells. The bacteria are grown 
in medium containing an appropriate antibiotic for 
selection of transformed cells. Because the plasmid 
vector carries an antibiotic resistance gene, cells that 
pick up plasmids during the transformation process 
will survive in the presence of the antibiotic, whereas 
cells without plasmids will not survive. Each bacterium 
in the library contains a vector carrying one of the spe- 
cific DNA fragments cut from the genome, and each 
bacterium will make many identical copies, or clones, 
of the vector containing the fragment. Genomic librar- 
ies are redundant; each library contains the equivalent 
of at least three copies of every DNA sequence in the 
genome, which guarantees that the DNA inserts in the 
library will overlap in the genome. 

Large DNA fragments can be cloned using a YAC 
vector. First the YAC DNA is cut with both BamHI and 
SnaBl (Figure 5.11A). Cleavage by BamHI occurs at 
two sites, releases the two telomere (TEL) DNA ends, 
and produces a BamHI fragment that is lost. The left 
arm of the YAC contains the trp1 gene and the right 
arm carries the ura3 gene. The YAC DNA is mixed with 
the fragmented genome DNA, and the ligation reac- 
tion joins the two arms via the DNA fragment inserted 
between them. Yeast cells are transformed with the 
DNA ligation mix and then distributed onto agar plates 
in medium that will select for cells that have been 
transformed by the ligated DNA. The genetic phe- 
notypes of the transformed yeast cells are tested and 
compared to the host yeast cells, which have muta- 
tions in the ura3, trp1, and sup4 genes. The yeast cells 
transformed with the linear YAC vector but lack a DNA 
insert are positive for ura3, trp1, and sup4. These cells 
can be distinguished from the transformed cells car- 
rying YAC vectors containing DNA inserts, which test 
positive for ura3 and trp1 but are defective in sup4 
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FIGURE 5.10 Constructing a genomic library. (A) Genome DNA 
is isolated from target cells. (B) The genome DNA is cut into DNA 
fragments using a restriction enzyme. (C) Plasmid DNA vectors are 
grown in and isolated from bacterial cells. (D) Plasmid vector DNAs 
are cut with the same restriction enzyme as in part B. (E) DNA frag- 
ments are mixed with the cut plasmid DNA, the single-stranded 
sticky ends base pair, and the DNA ends are ligated together to yield 
recombinant plasmids. (F) The recombinant plasmid DNA is used to 
transform the bacterial cells (from antibiotic sensitive to antibiotic 
resistance); the plasmids replicate independently of the bacterial 
chromosome during cell division so that many additional plasmid 
copies are made in each cell. During transformation, each bacterial 
cell picks up one molecule of recombinant DNA, becomes resistant 
to the antibiotic used to select for transformed cells, and grows into a 
colony made up of cells carrying identical recombinant plasmids. The 
entire population of bacterial cells containing all of the recombinant 
plasmids represents an entire genome library. 


function because the DNA insertion into the SnaBl 
site disrupts sup4 gene function. A special type of gel 
electrophoresis can be used to separate large DNA 
molecules cloned using YACs, including the native 
yeast chromosomes (Figure 5.11B). 
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DNA cloning using a YAC vector. (A) The cloning procedure shows how a double-stranded DNA fragment is inserted into the 


SnaBl site in the YAC DNA, which physically connects the left and right arms of the YAC DNA and generates a linear YAC molecule. The lin- 
ear YAC chromosome replicates along with the native chromosomes in the cell. The vector contains sequences for centromere, telomere, and 
replication origins, as discussed earlier. The YAC vector DNA is cut with SnaB1 and BamHI. Insertion into the SnaBl site disables function of 
the SUP4 gene, which allows scientists to select for the transformed cells. (B) Pulsed-field gel electrophoresis is used to separate native, uncut 
yeast chromosomes. The white bands on the PFGE image represent full-length double-stranded DNA molecules—the yeast chromosomes— 
listed by length, ranging from 1900 kb (1,900,000 bp) (1.9 Mb) to 225 kb (225,000bp) in length. 


To construct a genomic library, the genome DNA of the 
organism to be cloned is cut into overlapping DNA frag- 
ments that are inserted into the vector. The vectors and the 
inserted fragments are introduced into host cells by trans- 
formation. Taken together, these cells, vectors, and genomic 
DNA inserts constitute a recombinant DNA library. 


A Complementary DNA (cDNA) Library 
Stores Expressed Genes 


A cDNA library contains only transcribed sequences, 
eliminating the nontranscribed sequences and introns 
from the eukaryotic DNA cloned into the cDNA library. 
Most eukaryotic genes in vertebrate genomes, including 
the human genome, contain introns and exons. When 
expressed in the cell, an interrupted gene is copied into 
a precursor RNA that is much longer than the fully proc- 
essed mRNA. Cellular RNA splicing complexes remove 
the intron sequences in the precursor RNA, producing 
a final, mature mRNA (Figure 5.12). The mature MRNA 
contains contiguous exon sequences, which are trans- 
lated to make the encoded protein. The cDNA copies 
of the mRNA sequences are inserted into the vector and 
transformed into a prokaryotic host such as E. coli. 

The bacteria that make up the library contain vectors 
carrying complementary DNA copies of the mRNAs 
present in the cells at the time the library was con- 
structed. cDNA libraries have major advantages when 


it comes to searching for a target gene encoding a pro- 
tein, but they cannot be used to isolate nontranscribed 
DNA regions such as regulatory sequences and introns. 
cDNA libraries are far more compact and carry discrete 
copies of each expressed gene, rather than restriction 
fragments of genomic libraries. 


A CDNA library contains copies of expressed genes only 
(genes transcribed into mRNA). The DNA copies cloned 
in cDNA libraries are typically shorter than the DNA frag- 
ments cloned in a genomic library because the noncoding 
DNA sequences such as introns are not included in the 
cDNA libraries. 


The process of copying RNA into DNA was virtu- 
ally unknown until the 1970s when Howard Temin 
and David Baltimore discovered reverse transcriptase 
(RT), an enzyme that uses RNA as a template and syn- 
thesizes a complementary DNA strand from the mRNA 
template. The name “reverse transcriptase” comes from 
the fact that RT copies RNA into DNA, instead of the 
usual process of transcription in gene expression in 
which DNA is used as a template to make MRNA. This 
pioneering research on reverse transcriptase and retro- 
viruses led to the Nobel Prize for physiology or medi- 
cine awarded to Temin and Baltimore in 1975. 

Scientists routinely use commercially available 
reverse transcriptase to copy mRNA molecules into 
a single-stranded DNA copies that can then be made 
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FIGURE 5.12 RNA splicing. Splicing is the process that removes introns from precursor mRNAs to produce mature mRNAs. Interrupted 
genes contain introns and exons, which are expressed as long precursor RNAs that contain both introns and exons. The RNA splicing process 
removes the introns from the precursor RNAs and produces the fully processed mRNA molecules. 


into double-stranded DNA and cloned into a cDNA 
library. Scientists often use cDNA libraries to clone 
copies of expressed eukaryotic genes. Genomic DNA 
libraries are frequently used to identify and study the 
genomic sequences flanking certain eukaryotic genes 
and the associated transcriptional control elements. 


Making a cDNA Library 


For a cDNA library to be scientifically useful, it is 
essential that the RNA molecules are isolated from the 
specific cell type that expresses the protein of interest. 
For example, to clone the human gene for the insu- 
lin protein, it is necessary to start the experiment with 
human pancreatic islet cells that are actively expressing 
the insulin gene. 

In the first step of the procedure, the target cells 
(such as the pancreatic islet cells, for cloning the insu- 
lin gene) are harvested, broken open (lysed), and the 
RNA is separated from the other cellular components. 
This total RNA fraction contains mRNA, transfer RNA 
(tRNA), ribosomal RNA (rRNA), and small nuclear (and 
nucleolar) RNAs; the mRNA actually represents a very 


small percentage of the total RNAs. Ribosomal RNAs 
are usually the most abundant RNAs in an active cell 
because the cell needs to make large amounts of ribos- 
omes to satisfy the need for protein synthesis. 


Catching MRNA by Its Tail 


RNA is chemically unstable compared to DNA and 
in the cell RNA is rapidly degraded by ribonucleases 
(RNAse). Scientists developed a sterile, one-step puri- 
fication process to enrich for the relatively minor 
number of mRNAs present in the total RNA population 
without risking degradation of the RNA. This widely 
used procedure takes advantage of the poly-A tail 
attached to almost all mRNA molecules, which clearly 
distinguishes mRNA from any other type of RNA in the 
cell. Because only the mRNAs have poly-A tails, scien- 
tists devised a way to use the poly-A tails as a tool to 
separate the mRNAs from the other RNAs in the total 
RNA fraction. 

The mRNA molecules are purified using a form of 
affinity chromatography, which “grabs” the mRNAs by 
their poly-A tails. The scientist prepares a small column 
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FIGURE 5.13 Purification of mRNA by affinity chromatography. 
Scientists often want to analyze the RNAs expressed from genes in 
a particular cell type, where the mRNAs are usually a very minor 
part of the total RNA in the cells, making it advantageous to start by 
purifying the mRNAs away from the other RNAs. To do this, the total 
RNA prepared from the cells stored in ice is applied to an oligo-dT 
cellulose column. The mRNA molecules contain poly-A tails that 
base pair with the oligo-dT and remain on the column, while other 
RNAs move through the column and are collected (called the poly- 
A-minus RNA fraction). Following a wash to remove any remain- 
ing non-mRNA from the column, the mRNAs with poly-A tails are 
eluted from the oligo-dT column with low salt buffer and collected 
(the poly-A+ RNA fraction). Samples of the RNA fractions are vis- 
ualized by electrophoresis on a denaturing agarose gel. The major 
species in the poly-A-minus fraction are the 28S and 18S rRNAs, 
whereas the poly-A+ fraction is enriched for the mRNAs (compared 
to the rRNAs). A specific poly-A+ mRNA can be detected in the gel 
by Northern blot hybridization. www.biochem.arizona.edu/classes/ 
bioc47 1/pages/Lecture9/Lecture9. html. 


filled with a matrix of cellulose beads, each with 
attached single stranded DNA consisting entirely of T 
nucleotides, called oligo-dT (Figure 5.13). The poly-A 
tails on the mRNA form base pairs with the T strands 
on the oligo-dT cellulose, capturing the mRNAs on 
the column, while the RNAs lacking poly-A tails flow 
through the matrix of beads in the column. Following a 
wash to remove any remaining non-mRNAs in the col- 
umn, the mRNAs are released using a low-salt buffer 
that disrupts the hydrogen bonds between the A-T 
base pairs of the poly-A tails with oligo-dT. The mRNA 
molecules are collected in the solution as it exits the 
column. 
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The poly-A tail provides a convenient way to separate and 
purify mRNAs from cells. An oligo-dT column base pairs 
to the poly-A tails on the mRNAs, while other RNAs pass 
through the column. Poly-A+ RNA can then be released 
from the column with a low-salt buffer. 


The next step in constructing a cDNA library is 
to convert the mRNA of the cell into complementary 
DNA using the reverse transcriptase enzyme. The sin- 
gle-stranded DNA made by reverse transcriptase is then 
converted into double-stranded DNA in preparation for 
ligation into vectors. Like DNA polymerase, reverse tran- 
scriptase (RT) also requires a primer with a free 3’ OH 
group to initiate DNA synthesis. The poly-A tail provides 
a solution because the synthetic oligo-dT can base pair 
to the poly-A tail and serve as the primer for RT (Figure 
5.14). The RT enzyme copies the entire mRNA molecule 
into a complementary DNA strand, resulting in a DNA: 
RNA heteroduplex molecule; an RNA strand base paired 
to its complementary DNA strand. Next the mRNA strand 
of the DNA:RNA heteroduplex is removed either by the 
RNaseH enzyme or by alkaline treatment, both of which 
degrade the RNA strand only, leaving behind single- 
stranded DNA. The primer to initiate synthesis of the sec- 
ond DNA strand is provided by a short, complementary 
‘hairpin’ that forms at the end of the DNA strand, or by 
short random DNA primers added to the reaction. 


Reverse transcriptase creates a DNA:RNA heteroduplex. 
The RNA strand is removed, and DNA polymerase uses 
the remaining DNA as a template to synthesize the second 
strand, resulting in a double-stranded cDNA molecule. 


cDNA Is Different from Genome DNA 


The double-stranded cDNA molecule is different from 
genomic DNA in one very important, fundamental 
way: the cDNA sequence is a copy of a fully proc- 
essed and spliced mRNA, which means that the CDNA 
sequence contains only exons and lacks the introns that 
are removed during mRNA splicing. In comparison, the 
equivalent genomic DNA contains introns and exons, 
and regions flanking the genes, including promoters 
and other control elements. The chromosome DNA 
sequences that are not expressed as RNA in the cell 
will not be included in a cDNA library. 

A complementary DNA (cDNA) contains just the 
sequences that represent a copy of a mature mRNA. 
If provided with a promoter element, the cDNA can 
be used to express the protein encoded in the exon 
sequences, avoiding the need to properly process 
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FIGURE 5.14 Reverse transcription of mRNA to cDNA. (a) Mature fully processed mRNAs with poly-A tails are isolated from cells and cop- 
ied into an mRNA:cDNA heteroduplex by reverse transcriptase. (b) The RNA strand is removed from RNA:DNA heteroduplex with alkali or 
RNase H activity. (c) The cDNA loop (or added random premers) start DNA synthesis by DNA polymerase to make a double-stranded DNA 
product. (d) The hairpin loop is cut with a nuclease that leaves a double-stranded cDNA molecule that is suitable for insertion to a cloning 
vector. The cDNA is a copy of the mRNA and contains only exons not introns. 


or splice the RNA product in the cells. For this rea- 
son, mammalian expression vectors are specifically 
designed to express proteins from cloned cDNA genes. 
This approach is particularly important in cases where 
expressing the genomic copy of a gene is not practi- 
cal or when the host cells do not support the neces- 
sary splicing events needed to produce fully processed 
mRNA (e.g., bacteria). Usually it is not practical to try 
to manipulate the large genome copy of a eukaryotic 
gene containing many introns and exons, which can 
encompass many tens of thousands of base pairs (kilo 
base pairs; kb) of DNA. 


Screening a DNA Library 


Whether constructing a cDNA or genomic DNA 
library, the appropriate DNA fragments are mixed 
with the cut vector DNA and ligated together to form 
recombinant DNA molecules. The ligated DNA mol- 
ecules are used to transform £E. coli cells; the trans- 
formed cells containing the recombinant vector grow 
in the presence of an antibiotic. To use a DNA library 
productively, it is then necessary to use a DNA probe 
to find the gene or sequence of interest among the 
cells that contain the cloned DNA fragments. 
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FIGURE 5.15 DNA probes identify target DNA sequences in chromosomes. (A) A DNA probe is a short, single-stranded segment of DNA 
that base pairs preferentially to a complementary DNA sequence. DNA probes are used to find specific DNA sequences in DNA libraries, 
on chromosomes in Southern and Northern blots, and in many other applications where a target sequence is sought. (B) An example of an 
experiment using DNA probes: in this micrograph, a DNA probe has base paired (hybridized) to specific genes located on the fruit fly salivary 
chromosomes (arrows) (C) Arrows point to DNA probes base paired to specific DNA regions on the chromosomes (arrows). 


DNA Probes 


Scientists use DNA hybridization probes as tools to 
identify specific target DNA sequences hidden among 
hundreds of thousands of other DNA sequences in a 
genome or library (Figure 5.15). DNA probes are used 
in many different contexts in addition to screening 
DNA libraries. For example, DNA probes can be used 
in thin sections of tissue, on chromosome spreads, or 
in microarrays (see Chapter 13) to study different gene 
expression patterns (see Figure 5.9). 

DNA probes work by hybridization, which is 
the ability of one DNA strand (in this case, the DNA 
probe) to base pair with a complementary DNA strand 
(the DNA target). To design a DNA probe, it is essen- 
tial to know at least part of the target DNA sequence. 
This sequence information can be obtained from vari- 
ous sources including DNA databases. Sometimes the 
DNA sequences used as probes are deduced from the 
known amino acid sequence of a protein of interest; 
other probes are based on the sequence of a gene that 
has a similar function to the target gene, requiring the 
use of a probe with a sequence that is homologous 
(similar) to the target gene. 

A DNA probe is a short (about 20 bases), single- 
stranded DNA molecule with a chemical marker, 
fluorescent “tag”, or radioactive label that can be 
detected once the DNA probe has hybridized to the 


target DNA in the library or other biological prepara- 
tion. In the early days of recombinant DNA, the only 
way to label DNA probes was with radioactivity, but 
other forms of nonradioactive detection have been 
developed since then, including fluorescence and 
chemiluminescence. 


A short, single-stranded DNA molecule called a DNA probe 
is used to locate a specific gene or sequence of interest. 
DNA probes are synthesized with a chemical “tag” or label 
that is needed to detect the probe after the probe has base 
paired to its complementary target DNA. 


During a DNA library screening process (described 
later), the probe DNA is exposed to the single-stranded 
DNA of the library under conditions that vary the level 
of hybridization stringency. Under high stringency 
conditions the probe will base pair only to sequences 
that are fully complementary to the probe sequence; 
whereas low stringency allows the probe to hybridize 
to sequences that are similar to the probe but are not 
exact complements. Higher temperatures and lower 
salt concentrations increase stringency, allowing the 
probe to base pair only to sequences that are exact 
complements. 


120 


(b) 


(B) Original (tetracycline) 


DNA and Biotechnology 


Replica 


Replica (ampicillin) 


FIGURE 5.16 The replica plating technique for screening bacterial colonies. (A) Bacteria grow in colonies on solid agar in Petri plates 
(dishes). (a) A circle of sterilized velvet fabric is placed directly over the colonies. (b) Some bacterial cells from each colony are transferred to 
the fabric. (c) The fabric circle is laid onto a fresh Petri dish of agar, transferring cells on the velvet to the growth medium. (d) The plates are 
incubated and the colonies grow, making a copy, or “replica plate,” of the original plate of colonies. (B) The replica plating technique can be 
used to screen for antibiotic resistant bacteria that contain antibiotic-resistant genes. A replica of bacterial colonies is made on a plate with- 
out antibiotics (left) and on a plate containing the antibiotic ampicillin (right). All of the colonies can grow on the master plate copy lacking 
antibiotics. Only the colonies that are resistant to ampicillin can grow on the replica plate containing ampicillin (circled in red), whereas 
colonies that are not resistant to the antibiotic cannot grow on the plate containing ampicillin (circled in blue). 


Replica Plating 


The purpose of a cDNA library screen is to find the 
cells that contain the target DNA so that the cells can 
be grown and the target DNA isolated or expressed. 
Because exposure to the DNA probe involves killing 
the cells, it is necessary to keep a record to allow the 
scientist to relocate the appropriate target cells. This is 
often accomplished using a replica plating method that 
creates a master plate of library colonies (Figure 5.16). 
The transformed E. coli cells are diluted and spread on 
solid nutrient agar in Petri dishes so that each bacterial 
cell has room to grow into a pure colony of millions of 
identical, cloned cells generated by reproduction of a 
single transformed bacterium. A replica of each Petri 
dish is made for the purpose of identifying the colony 
containing the target cloned DNA, which are available 
on the original master plate. 

The replica plating technique used to make copies 
of Petri dishes containing growing microbes or phage 
plaques was originally developed in 1952 by Joshua 
and Esther Lederberg. A circle of sterile velvet cloth is 
pressed firmly onto a Petri dish of tiny bacterial colo- 
nies (Figure 5.16A). The velvet is carefully peeled off 


and then gently pressed onto the agar of a fresh Petri 
dish. The velvet cloth picks up some cells from each 
colony on the first dish and transfers the cells to the 
second fresh plate, the replica plate, which now 
becomes a copy of the original plate. The colonies on 
the copy plate can be analyzed without damaging the 
original library master plates. A replica copy of the 
bacterial colonies can be made onto plates contain- 
ing an antibiotic such as ampicillin. All of the colonies 
grow on the copy of the master plate without antibi- 
otics, but only colonies that are resistant to ampicil- 
lin can grow on the replica plate containing ampicilin 
(Figure 5.16B). 

The replica plating technique is still important in 
modern molecular genetics labs for screening DNA 
libraries (Figure 5.17A—D). Special materials that bind 
DNA tightly, such as nitrocellulose filters or nylon 
membranes, are used in place of velvet to make replica 
copies on the cells on a Petri dish. The replica filters 
are then soaked in a sodium hydroxide solution at a 
high temperature (65°C) to break open the cells in the 
colonies. Importantly, the DNA remains fixed to the 
filter or membrane at the same spot as previously 
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FIGURE 5.17 Screening a DNA library. (A) Bacterial cells containing the cloned gene are plated onto nutrient agar, where they form colo- 
nies. (The same approach can be used to screen the DNA or RNA in plaques made by viruses or phage.) (B) A circle of nitrocellulose filter 
paper is gently applied to the surface to obtain a replica copy of the colonies on the plate. (C) The nitrocellulose paper is peeled off the gel 
surface complete with bacteria transferred to the filter from the colonies. (D) The cells on the filters are disrupted in place and the DNA on 
the filter is separated into single strands that remain on the filters. The treated filters are placed in a sealed plastic bag in a hybridization buffer 
solution containing a single-stranded DNA probe that will specifically base pair with a short segment of the target DNA in the very few colo- 
nies in the library where the DNA fragment is present. (E) The filters are exposed to x-ray film (or are processed appropriately for alternative 
systems) to detect the DNA probe and consequently the target DNA. (F) Dark areas appear on the x-ray film where positive clones have emit- 
ted radioactivity. (G) The x-ray film is compared with the original master plate to determine which colonies contained the DNA that emitted 
the radioactivity. These are colonies of bacteria carrying the target DNA sequence. 


occupied by the cells. The solution also denatures 
the DNA (separates the two strands) so that the sin- 
gle-stranded DNA remains fixed to the filter in the 
exact same pattern as the cells on the original plate. 
These “filter replicas” of all the plates in a DNA library 
are then screened with DNA probes to identify the 
location(s) of potential positive clones. 


Finding the Target DNA 


The replica filters are placed in a sealed plastic bag in 
a solution of buffer containing millions of copies of a 
specific DNA probe (Figure 5.17D). The DNA probe has 
access to the single-stranded DNAs located on the filter 
circles in the positions previously occupied by the colo- 
nies. The probe will bind specifically only to a comple- 
mentary DNA sequences on the filters. Under conditions 
of high stringency, the probe will base pair to its exact 
DNA complement; under low stringency conditions, the 
probe will base pair to similar sequences as well. 

The filters are removed from the bag and any non- 
specific probe is washed away. Then, if the probe is 
radioactive or uses a light detection system, the dried 
filters are placed against to a piece of x-ray film in the 
dark so that any light signals or radioactive emissions 


from the filters will create dark spots on the film (Figure 
5.17E-G). Each dark spot indicates the positions on the 
filter where the DNA probe base paired with the target 
DNA and thereby indicates the position of that specific 
positive colony on the corresponding original plate. 
The live cells on the original plate can be harvested 
and grown to continue the process of obtaining the 
target DNA. 

Once positive colonies have been identified on the 
master plates, the cells from each “positive” colony are 
carefully transferred from the master plate into sepa- 
rate liquid cultures for further study and for storage. 
The methods for cultivation and maintenance of most 
host organisms are well established, and instructions 
are provided by commercial sources of DNA librar- 
ies. Scientists often purchase a commercially prepared 
cDNA library instead of making the library in the lab. 
An expensive library can be rendered useless if the 
cells are grown incorrectly, because this can allow 
a subpopulation of vectors without inserts to over- 
grow the total library cell population. For this reason 
it is can be worth the investment to either purchase a 
well-characterized library or contract with a company 
to create specialized libraries from RNA preparations 
provided by the client. 
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Cells containing DNA libraries have been kept 
frozen for years or for decades when lyophilized 
(freeze-dried), which avoids the danger of contamina- 
tion when the frozen culture of original library cells is 
removed from the freezer for use. To grow more library 
cells, a few ice crystals containing cells are scraped off 
the top of the frozen culture with a sterile toothpick 
and spread out to thaw on a sterile agar dish. The fro- 
zen cells begin to grow once again, providing cloned 
genetic information to be retrieved when needed by 
the DNA technologist. 


EXPRESSING CLONED GENES 


Once a gene has been identified, isolated, and cloned 
from the appropriate genome or identified in a DNA 
library, the next step could involve expressing the pro- 
tein encoded by the gene. To accomplish this feat, a 
copy of the gene is inserted into an expression vector 
that is designed to promote transcription and transla- 
tion, producing the encoded protein in the cell. A key 
question in protein expression experiments involves 
which type of host cell will best express the protein 
product, and there are many choices. Many mamma- 
lian proteins are expressed in bacteria, but bacterial 
cells do not provide posttranslational modifications, 
which are common in eukaryotic proteins, although 
posttranslational modification is available to different 
degrees in eukaryotic tissue culture cells, insect cells, 
and yeast cells. 

Regardless of the source of the gene under study, the 
choice of expression vector will depend on the charac- 
teristics of the recipient cells. For example, expression 
vectors contain different types of promoter sequences 
to ensure that the inserted gene (or cDNA) will be 
transcribed properly in the host cells. (Promoters are 
naturally occurring DNA sequences that are usually 
located before the start of a gene and help to govern 
the frequency and level of transcription of the gene; 
see Chapter 3.) There are different types of promoter 
sequences, and the promoters used in prokaryotic cells 
are very different from the eukaryotic promoters. 

Basic types of promoters used to drive transcrip- 
tion include constitutive promoters, which are “on” 
all the time and lead to continual expression of the 
gene. Constitutive promoters usually express high 
levels of a desired protein, but they do not shut off, 
which may have damaging consequences to the host 
cells. Inducible promoters regulate gene expression by 
responding to signals; in vivo, the signals come from 
the metabolism or environment of the cell, and in vitro, 
the scientist supplies the signal in the form of a chemi- 
cal or other stimulus whenever expression is desired. 
Such signals might include the presence or absence of 
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a small molecule in the growth medium or the pres- 
ence or absence of a phosphorylated protein. 


Expression vectors have promoters to drive gene expres- 
sion: either constitutive promoters, for a constant level of 
expression, or inducible promoters that can be turned on 
or off in response to various cellular signals. 


Collecting Cloned Protein Products 


When expressed in eukaryotic host cells, cloned genes 
are transcribed along with the endogenous genes in 
the nucleus. The mRNAs are transported out of the 
nucleus and into the cytoplasm for translation into 
proteins by the ribosomes. Expression vectors often 
promote high levels of gene expression, so the mRNAs 
and proteins build up in the host cells. Because the 
recombinant gene product is foreign to the cell pro- 
ducing it, the host cells sometimes destroy the foreign 
protein. Even simple bacteria make protease enzymes 
that degrade foreign proteins inside the cells. To avoid 
having the desired cloned protein degraded by host 
cell enzymes, scientists constructed host bacterial 
strains that are genetically deficient in the major host 
protease enzymes. 

When expressed in bacterial cells, the cloned foreign 
proteins sometimes accumulate in insoluble masses 
called inclusion bodies. Although inclusion bodies are 
an artifact of protein accumulation, they can be useful 
in protecting the newly synthesized proteins from pro- 
tease attack, and they can be easily harvested from the 
bacteria. Insoluble protein can be used to make anti- 
bodies against the cloned protein, which in turn can 
be used to track protein expression in various ways. 
To be functional, however, cytosolic proteins must be 
rendered soluble and refold into the three-dimensional 
shape required for activity, and after being in an insol- 
uble state, many proteins cannot refold correctly and 
are nonfunctional. Other cloned proteins are soluble 
when expressed in the bacterial cells and so are tar- 
geted for purification strategies involving protein affinity 
chromatography. 

Many different types of protein expression vectors 
offer the efficient production of cloned protein prod- 
ucts. In some cases the cloned proteins are conveniently 
secreted out of the host cell and can be purified from 
the medium. Genes encoding proteins that are normally 
secreted from the cell contain a DNA signal sequence 
and encode a corresponding amino acid signal peptide 
located at the beginning of the synthesized protein. The 
signal peptide ensures that the protein will be sent along 
the correct pathway to be secreted from the cell. A sig- 
nal sequence is included on many expression vectors so 
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FIGURE 5.18 Baculovirus is a very efficient protein expression system. (A) The rod-shaped baculovirus. (B) Insect cells infected with bacu- 
lovirus growing in tissue culture. (C) Rod-shaped baculoviruses are enveloped in the cell nucleus and covered by a protein matrix. (D) Larvae 
(caterpillars) infected with baculovirus vectors produce the largest amounts of a target protein. 


that the resulting protein, whether originally destined for 
secretion or not, can be efficiently targeted for secretion 
from the cell, potentially increasing the yield and ease 
of protein recovery. The signal peptide can subsequently 
be removed during purification of the protein. 


Cloned proteins are usually foreign to the host cell and may 
be degraded or, in bacteria, may form insoluble inclusion 
bodies of inactive protein. Targeting a protein for secretion 
from the host cell is one way to avoid these problems. 


Baculovirus (Figure 5.18) normally infects insect 
cells but recently has been used to transfect mamma- 
lian cells as well. It is useful for expressing protein in 
eukaryotic cells because of the large amount of protein 
that can be obtained and because the host cells per- 
form the most important eukaryotic posttranslational 
processing steps that are lacking in E. coli and other 
types of prokaryotic host cells. In the wild, the rod- 
shaped baculovirus attacks more than 500 different 


species of insects. Baculovirus vectors are particularly 
good at producing large amounts of a cloned protein in 
the insect larvae, exceeding the productivity of foreign 
proteins expressed from vectors in mammalian cells. 


THE POLYMERASE CHAIN REACTION (PCR) 


Kary Mullis developed PCR in 1984 while working for 
the Cetus Corporation, and he was awarded the Nobel 
Prize in chemistry in 1993 for this achievement. PCR 
allows a scientist to amplify a single DNA molecule 
into a billion identical copies of DNA in a few hours, 
using just DNA primers and a special DNA polymerase 
enzyme. Since its discovery, PCR has become the cen- 
tral method used on literally thousands of diagnostic 
tests in genetics and medicine, and in police fingerprint- 
ing and forensic analyses, with additional applications 
continually being developed (see Chapters 8, 10). 

PCR can start with a tiny amount of DNA, even as lit- 
tle as the chromosomal DNA in a single hair follicle. PCR 
is catalyzed by remarkable thermostable (heat-stable) 
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enzymes that survive the extremely high temperatures 
used to separate the DNA strands between PCR cycles. 
PCR is a rapid way of “cloning” identical DNA fragments 
directly from a genome DNA preparation without using 
a vector or host cells and can be used to prepare the 
DNA probes used in DNA screening to identify a clone 
in a library. As long as the scientist has a sample of the 
DNA of interest and knows some part of the target DNA 
sequence or neighboring sequences, PCR can shorten the 
steps in recombinant DNA cloning procedures. As we 
will see, PCR permits the scientist to locate the molecu- 
lar equivalent of a DNA needle in a veritable haystack of 
different DNA sequences. 

PCR is not without problems and chief among 
these is the risk of contaminating the PCR sample with 
extraneous DNA. As with any extremely powerful tool, 
PCR can be misused and misinterpreted; if the sam- 
ple DNA is contaminated, any DNA sequences that 
are coincidentally complementary to the PCR primers 
will be amplified along with the target DNA, ending 
up as a large fraction of the final PCR DNA product. 
To eliminate contamination artifacts, great care must 
be taken in preparing samples for PCR testing. Such 
preparation tends to be labor intensive and costly. The 
development of uniform rules for handling DNA and 
specialized equipment have reduced the contamina- 
tion problem significantly. Commercial PCR kits are 
readily available and are very useful in preparing the 
necessary reagents and enzymes to perform PCR in 
research and many other applications. 


PCR is used to amplify a target DNA sequence that is 
present in tiny amounts—as little as a single molecule— 
while surrounded by large amounts of nontarget DNA. It 
is critical to avoid contamination with extraneous DNA in 
order to achieve reliable results with PCR. 


PCR Cycles and DNA Amplification 


To amplify a region of DNA using PCR, the following 
biological reagents are required (in addition to a PCR 
thermocycling machine): 


e A DNA molecule (usually double stranded) con- 
taining the target sequence to be amplified. 

e Many copies of two short DNA “primers”, which are 
complementary to the DNA sequences flanking the 
target DNA. The primers are used by the enzyme to 
initiate DNA replication (Figure 5.19). 

e The most well-known thermostable DNA polymer- 
ase enzyme used in PCR is Taq polymerase, which 
is derived from the thermophilic (heat-loving) 
bacterium Thermus aquaticus, first isolated by 
Thomas Brock in the 1980s from the hot springs in 
Yellowstone National Park. 
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FIGURE 5.19 The polymerase chain reaction (PCR). (A) Heat is used 
to separate the double-stranded (ds) DNA into single strands of DNA. 
Cycle 1: Nucleotides, a heat stable polymerase, and a DNA primer 
are mixed together. (B) The polymerase extends the primer and pro- 
duces two ds DNA molecules. (C) The process is repeated; at the end 
of cycle 2, four dsDNA molecules are present. Repeating the process 
in cycle 3 yields a total of eight short ds DNA molecules. The short 
strands are amplified, increasing the number of copies geometrically 
in future cycles, amplifying the target DNA. 


e Nucleotides to build new DNA strands during DNA 
replication. 

e A buffer solution containing the required reagents 
and chemical environment for the synthesis of DNA. 


PCR DNA amplification involves three steps that 
are performed as a continuous cycle (Figure 5.19) in a 
PCR machine: 

First, a tiny amount of solution containing the target 
double-stranded DNA sample is heated, which breaks 
the hydrogen bonds between the base pairs, denatur- 
ing double-stranded DNA molecules into two separate 
single strands. 

Second, the single-stranded DNA primers are added 
and the temperature is lowered so the primers, which 
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are complementary to specific base sequences on both 
sides of the target DNA sequence, can base pair to the 
single-stranded target DNA. The primers essentially 
function as “start” and “stop” signals during each round 
(cycle) of DNA synthesis, acting as molecular brackets 
to identify the region of the DNA molecule to be copied 
in the amplification step (see Figure 5.19). The primers 
greatly outnumber the target DNA strands, so the target 
strands do not base pair with each other at this stage. 

Third, the thermostable Tag DNA polymerase initi- 
ates DNA synthesis at the primers that are base paired 
to the template DNA, synthesizing the complementary 
DNA strand and thus making a double-stranded DNA 
molecule (Figure 5.19). Using a heat-stable polymer- 
ase enzyme is very important because the heat used to 
denature the DNA strands in each cycle would destroy 
an ordinary enzyme, requiring that the enzyme be 
replaced after each heating step, making PCR expen- 
sive and impractical. 

At the end of step 3, two double-stranded DNA 
molecules exist where there had been just one start- 
ing DNA molecule. As the PCR process continues, the 
cycle (steps 1, 2, and 3) is repeated 30 to 60 times, 
each cycle beginning with a reheating of the DNA mix- 
ture to denature the DNA into single strands. A full PCR 
cycle takes one to two minutes, and in each cycle, each 
newly synthesized DNA molecule serves as an addi- 
tional template for DNA synthesis. Thus, the number 
of DNA copies increases exponentially and millions of 
identical copies of the same DNA strand are made, all 
starting from as little as one DNA helix molecule. 


PCR consists of steps that specifically replicate a target DNA 
sequence exponentially, resulting in millions of copies of the 
sequence in a few hours, effectively cloning the target DNA 
sequence without vectors or host cells. 


The usefulness of PCR extends beyond making mil- 
lions of identical DNA copies in a lab. PCR technology 
is fundamental to the development of many important 
methods such as making specific DNA mutations, diag- 
nosing diseases, detecting infections, identifying indi- 
viduals, and classifying pathogenic organisms, to name 
just a few of the many applications of PCR technology 
in modern science. All these applications rely on the 
specificity of DNA base pairing in PCR—the ability to 
bracket a sequence of interest accurately and replicate 
it faithfully and quickly. 


SUMMARY 


DNA is cloned routinely in laboratories where scien- 
tists are doing research on the fundamentals of genes 
and cells, as well as in labs that apply science to 
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practical goals such as discovering new medicines. 
Coaxing a cell to manufacture a desired gene product 
often begins with the isolation of the gene. The gene 
is cut from the genome by restriction enzymes and is 
then ligated into a molecule called a vector, which 
carries the foreign DNA into the host cell. The vector 
must have DNA sequences that enable it to replicate, 
in the host cells, appropriate restriction sites for clon- 
ing, and an appropriate antibiotic resistance gene to 
survive antibiotic selection methods. 

So far, the E. coli bacterium is the most widely used 
host cell for routine DNA cloning experiments and 
for many protein expression strategies as well. The 
bacterium B. subtilis and the single-celled eukaryotic 
yeast S. cerevisiae are also commonly used for pro- 
tein expression. More complex eukaryotic cells can be 
used as hosts, although these cells are more difficult 
to grow in the lab and to store for long periods of time 
unchanged. cDNA has become an indispensable tool 
for studying eukaryotic genes and expressing eukary- 
otic proteins. In some cases it is possible to engineer 
the bacteria to export the gene product out of the cell; 
many expression vectors therefore contain protein 
secretion signal sequences. 

DNA libraries are collections of host cells that act 
as storage systems for whole genomes (genomic DNA 
libraries) or for sets of genes that are expressed from a 
particular cell type (CDNA libraries). Genomic libraries 
contain virtually all the DNA of an organism, coding 
and noncoding, whereas cDNA libraries contain DNA 
copies of the mRNAs expressed at the time the target 
cells were harvested, and no other DNA sequences. 
Searching a DNA library for a sequence of interest 
requires a DNA probe that will bind specifically to the 
desired target DNA. 

Under some circumstances, where even part of 
a target DNA sequence is available, PCR is the tech- 
nique of choice for amplification and cloning of DNA 
sequences. PCR is more than just another tool in the 
DNA technologist’s toolbox. The impact of this tech- 
nology has become so widespread that PCR is routine 
in labs big and small around the world. 


REVIEW 


To test your knowledge of the chapter’s contents, 
consider the following review questions: 


1. List three naturally occurring cellular processes 
that scientists have adapted for use in recombinant 
DNA technology, and explain how each one is 
used in DNA cloning. 

2. Name three possible DNA vector types, and 
describe the advantages and limitations of each as 
vectors. 
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10. 


. Describe a method used to introduce recombinant 


DNA into E. coli cells. 


. E. coli has been called the “workhorse” of molecu- 


lar genetics. What features made this bacterium 
such a useful laboratory organism? 


. Explain the differences between a genomic DNA 


library and a cDNA library. 


. How is a cDNA molecule different from the 


original gene in the DNA genome? 


. Explain the important characteristics of a DNA 


probe. 


. Explain in what circumstances a researcher would 


decide to use a cDNA library instead of a genomic 
library to find a gene. 


. In screening a DNA library, a scientist forgets to 


denature the DNA on the replica filters but contin- 
ues on and finishes the experiment. What will the 
scientist see on the x-ray film? 

Imagine that your PCR machine has completed 
three PCR cycles, which amounts to nine steps 

of PCR (three PCR steps, repeated three times). 
Assuming that you started with a single target 
DNA molecule, how many copies of the single 
target DNA molecule will you have? Draw and 
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label the target DNA, primers, and newly synthe- 
sized strands as they appear after the first, fourth, 
and seventh steps. 
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Monogamy Gene Links Men’s DNA to Happily Ever 


After in Marriage 


Bloomberg.com, www.bloomberg.com/apps/news?pid=20 
601124&sid=a5kGdZ7L7vMl&refer=home 

By Michelle Fay Cortez, 2008 

Traits like faithfulness and commitment in married men 
may depend on the same gene that keeps prairie rodents 
committed to their mates, according to studies on the 
“monogamy gene.” Early studies on small prairie rodents 
called voles indicated that the levels of vasopressin hor- 
mone (made in the hypothalamus) influenced the monoga- 
mous and polygamous behavior. In 2002, Larry Young, a 
behavioral neuroscientist at Emory University (Atlanta), 
reported on the highly developed social behaviors of the 
Prairie voles and the Mountain voles (Figure 6.1). The two 
voles look very similar, but underneath the two animals 
exhibit very different “family values.” Prairie voles are very 
sociable, mate for life, and the parents share care for the 
young. Mountain voles on the other hand are typically 
solitary, promiscuous, and only the mother mountain vole 
cares for the young. 


FIGURE 6.1 The vole story of the monogamy gene. The prai- 
rie voles and mountain voles look very similar, but they have 
very different “family values”: prairie voles are sociable, they 
mate for life, and parents share care of the young. Mountain 
voles are solitary, promiscuous, and the mother alone cares for 
the young. 
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In 2008, Swedish scientists (Karolinska Institute) extended 
this work to humans and they found that vasopressin in 
men contributes to making a happy marriage. Researchers 
performed genetic tests on the men and conducted sur- 
veys to gather personal information about the participants. 
This study showed that men carrying a DNA variation in the 
vasopressin control gene, or who had extra copies of the 
variant gene, scored much lower on a scale measuring part- 
ner bonding. Even in humans it seems that complex social 
behaviors such as personal relationships depend on biology 
and genes. Variations in the vasopressin control gene are 
also implicated in the social deficits of autism, suggesting a 
possible connection between autism and a gene involved in 
forming strong personal relationships. 

A major author of the study, Paul Lichtenstein, strongly 
cautioned people not to over-interpret these results, insisting 
that this gene alone can’t predict successful relationships, 
“It gives you a predisposition, but it doesn’t determine how 
successful you will be in marriage.” In the overall popula- 
tion, people with this gene variant will probably tend to 
have more trouble in their marriages, but we know that the 
genotype alone cannot be used to predict any trait includ- 
ing marriage potential, because many other factors contrib- 
ute to complex human behaviors. 

We do not yet understand how biological processes 
in the human brain are affected by variations in vaso- 
pressin and influence human bonding and relationships. 
In rodents, the interactions between voles activated gene 
expression in the reward and reinforcement centers in the 
brain, regions that also control addiction. In the human 
brain vasopressin regulates water retention by the body 
and blood pressure and is also linked to aggression. 


LOOKING AHEAD 


This chapter deals with the monumental effort made 
by scientists to learn the DNA base sequences of all 
46 chromosomes and to pinpoint the locations of all 
the genes in the human genome. On completing this 
chapter, you should be able to do the following: 


e Explain the most surprising features of the human 
genome as revealed by the DNA sequence of the 
human genome. 

e Describe the technical and ethical challenges that 
faced the Human Genome Project scientists. 

e Describe how geneticists and DNA technologists 
go about developing gene linkage maps and physi- 
cal maps of genomes such as the human genome. 

e Summarize the processes by which the sequence of 
bases is determined in a particular gene. 

e Identify the organisms whose genomes have already 
been sequenced and relate what surprising facts 
were revealed by each new DNA genome sequence. 
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e Describe how the sequence of the dog genome 
helped us to understand more about certain human 
diseases. 


INTRODUCTION 


On a January afternoon in 1989, a group of biolo- 
gists, ethicists, industry scientists, engineers, and 
computer experts gathered in a conference room at 
the National Institutes of Health (NIH) and listened to 
Norton Zinder, a molecular biologist from Rockefeller 
University (New York), announce the the official 
beginning of the most ambitious scientific endeavor 
ever undertaken. “Today we begin,” declared Zinder, 
“Today we are initiating an unending study of human 
biology. Whatever else [happens] it will be an 
adventure, a priceless endeavor.” 

Those words launched the Human Genome Project 
(HGP), a monumental scientific effort that rivaled in 
scope and importance the Apollo program, which put 
humans on the moon. The goal of the HGP was to 
determine the specific sequence, or the “order,” of every 
DNA letter (base) in the human genome. This effort was 
of paramount importance because it gave scientists 
access to the human master plan, all the genetic infor- 
mation needed to make a human being. The success of 
this ambitious goal has tremendous implications for the 
future, and represents an awesome gift for our children 
and our children’s children. 

The typical adult human body is made up of about 
220 different types of cells. Each cell in the body con- 
tains chromosomes that carry human genome DNA. 
All the cells in the same human body carry the same 
human genome DNA. Each human chromosome con- 
tains one very long, linear double-stranded DNA helix 
molecule extending from one end of the chromosome to 
the other end. As discussed earlier (see Chapters 2 and 
3), the genetic instructions in the human genome are 
written in the DNA molecular letters A, G, C, and T, and 
are arranged in DNA sentences that convey the genetic 
information needed to make specific proteins to cre- 
ate, build, and maintain each unique, individual human 
being (Figure 6.2). 

In this chapter we will find out about the amazing 
Human Genome Project, the people involved, and the 
secrets revealed. The goal of the HGP was to deter- 
mine the DNA sequence of the entire “human genome” 
which is created when a sperm fertilizes an egg. The 
resulting fertilized egg, the zygote, inherits two copies of 
the human genome, one from Mom (in the form of 23 
chromosomes) and one from Dad (23 chromosomes). 
The two parental human genomes are brought together 
to form the offspring’s new human genome (46 chromo- 
somes) (Figure 6.3). 
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FIGURE 6.2 Human genome instructions, written in the DNA lan- 
guage and expressed as proteins. (A) The human genome instructions 
are written in the DNA language and carried in chromosomes. (B) 
Genes are written in DNA letters (A, G, C, and T). (C) Genes encode 
information to direct the synthesis of proteins. The proteins interact 
with each other and with other components in the cell to build the 
diverse molecular machines needed to create a unique human being. 


The goal of the Human Genome Project was to deter- 
mine the DNA sequence of the human chromosomes, 
giving the world access to the human DNA master plan, 
the genetic information needed to build a living human 
being. 


MODEL ORGANISMS ARE FUNDAMENTAL 
TO GENOMICS 


Major advances in automated DNA sequencing tech- 
nologies and computer software innovations have 
rapidly increased the number of genome sequences 
available for study. Now, the genomes from many dif- 
ferent organisms are accessible online with instant links 
to information on genes, DNA mutations, and related 
diseases; just click on your favorite organism’s genome 
sequence and reveal the secrets written in the DNA. 

Scientists studying genomes other than the human 
genome made many important contributions to the 
eventual success of the HGP. Research on smaller 
genomes helped scientists to develop the technologi- 
cal advances required to determine the DNA sequence 
of the much larger human genome years later. For dec- 
ades before the HGP began, scientists were studying 
genomes from prokaryotic organisms such as the E. coli 
bacteria and more recently the genome of the white lab 
mouse was sequenced (Figures 6.4 and 6.5). 
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FIGURE 6.3 Humans inherit 23 chromosomes from Mom and 23 
chromosomes from Dad. (A) Each egg and sperm cell contains 23 
chromosomes (different chromosomes are colored blue and yellow). 
(B) When a human sperm fertilizes an egg, a total of 46 chromo- 
somes (two copies of each of 23 chromosomes) are inherited by the 
zygote. (C) Human gender is determined by the X and/or Y chromo- 
somes: XX (female), XY (male). 


The budding yeast Saccharomyces cerevisiae is a 
model single-celled eukaryote that is essential for modern 
research in genetics and cell biology (Figure 6.6). People 
in the ancient world used natural breeding methods to 
produce strains of yeast cells that make bread and wine. 
Research on the nematode roundworm Caenorhabditis 
elegans has had a huge impact on our understanding 
of nerve cell development in all organisms including 
humans. The origin, development, and fate of all the cells 
in the adult C. elegans worm have been traced from the 
beginning of the embryo to the adult, providing scientists 
with insights into the development of the worm’s body 
systems, including nerve development (Figure 6.7). The 
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FIGURE 6.4 E. coli bacteria is a model procaryotic organism, it 
lacks a nucleus. This E. coli cell has almost finished dividing into two 
cells. The dark staining material around the inside perimeter of each 
cell is the circular E. coli DNA chromosome that is associated with 
the inner cell membrane. 
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FIGURE 6.5 Comparison of human and mouse chromosome DNA. 
The DNA sequences of human and mouse chromosomes are com- 
pared. The colored blocks represent segments of the human genome 
containing at least two genes that are in the same order in the mouse 
genome and in the human genome. Each color corresponds to a spe- 
cific mouse chromosome as indicated by the color key at the bottom 
of the figure. Shown in black are the centromeres, pericentric regions, 
and the repeated DNA on some short chromosome arms. 
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FIGURE 6.6 The budding yeast, Saccharomyces cerevisiae, is a 
model organism used in many areas of basic research. The two bud- 
ding yeast cells shown contain vacuoles (red), which are special 
compartments that transport proteins and lipids in the cell. 


fruit fly, Drosophila melanogaster (Figure 6.8), and the 
zebra fish, Danio rerio (Figure 6.9), are both key model 
systems that are central to research on cell differentia- 
tion and specialized cell development. The major plant 
systems used in research are mustard weed (Arabidopsis 
thaliana) and corn (Zea mays) (Figure 6.10). 

In 1996, genome research changed our understand- 
ing of the organization of life on earth (see Chapter 7, 
Figure 7.3). The existence of a third major branch of life, 
the archaea (halophiles and thermophiles), in addition to 
the bacteria (cyanobacteria and heterotrophic bacteria) 
and eukaryota (plants, animals, fungi) branches, was con- 
firmed by DNA sequence analysis of the Methanococcus 
jannaschii genome, a thermophilic organism found 
growing in submarine thermal vents. By the early 2000s, 
many genomes were sequenced including genomes from 
fish, cats, and catfish. 


Even before the start of the HGP, scientists studied and 
sequenced genomes from many model organisms including 
bacteria, single-celled eucaryotes, plants, and even some 
simple animal genomes. 


Dog Genes Hold the Secrets to Many 
Human Diseases 


The DNA sequence of the dog genome was completed 
by the National Human Genome Research Institute 
(NHGRI) and released in 2005. The DNA sequence 
of the dog genome was particularly important to 
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FIGURE 6.8 Gene expression patterns in fruit fly embryo. (A) This 
Drosophila embryo shows the gene expression patterns early in 
embryo development. Each color represents the production of these 
proteins: Knirps (green), Kruppel (blue), and Giant (red). The yel- 
lowish areas indicate the cells that express both Knirps and Giant. 
The darker areas of the embryo contain cells that are not expressing 
any of these three genes. (B) Diagram of a Drosophila embryo with 
the sections labeled. (C) Adult fruit fly. 


scientists who are searching for genes involved in can- 
cer and other human diseases. At least half of the 300 
genetic diseases inherited by dogs also affect humans, 
suggesting that these diseases probably involve similar 
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FIGURE 6.7 Development of the C. elegans 
embryo. (A) Original 3D image of cells in a 
C. elegans embryo (top). Surface rendering of 
the segmented C. elegans embryo with cells 
randomly colored (bottom). (B) Adult nema- 
tode worm swimming (top). Adult C. elegans 
worm containing cells in the nervous system 
that are expressing green fluorescent proteins 
(GFP) (bottom). 


(B) 


FIGURE 6.9 Zebra fish are model organisms for research on cell 
and tissue development. (A) Baby zebra fish. (B) Adult zebra fish. 


genes in both dogs and humans. Genetic diseases are 
more common in certain purebred dogs, such as kid- 
ney cancer in German shepherds and eye problems in 
border collies. As we learn more about the genomes 
of purebred dogs, scientists have been able to use this 
information to identify genes involved in these disor- 
ders in dogs and humans. 

Dogs are an amazing species; they exhibit more 
diversity in body size and characteristics than any other 
mammal. Just note the dramatic size difference between 
an Afghan hound and a Chihuahua (Figure 6.11). 
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FIGURE 6.11 Dogs exhibit the greatest diversity in size and shape 
of all mammals. (A) These Afghan and Chihuahua dogs illustrate the 
dramatic size differences between breeds. (B) The DNA genome 
from the boxer, Tasha, was sequenced. 


This is not the case for people; adult humans tend to 
grow to about the same height, between 5 and 6 feet 
tall. Different breeds of dog vary in size and exhibit 
diverse body characteristics, but adult dogs of the same 
breed usually grow to about the same height. For exam- 
ple, all adult Golden Retriever dogs stand between 20 
and 24 inches at the shoulders. 

To understand more about the genes that control 
height in dogs, scientists have focused on the insulin- 
like growth factor (IGF) gene, which determines overall 
body size in mice. Scientists decided to study Portuguese 
water dogs (the same breed as President Obama’s dog, 
Bo) because this breed normally produces both small- 
and large-sized adult dogs. They analyzed the DNA 
genomes of 463 Portuguese water dogs and found two 
different forms (alleles) of the dog IGF gene, the “small” 
dog IGF allele and the “large” dog IGF allele, which 
differ by only two DNA base pairs. They discovered a 
correlation between genes and height: almost all of the 
small Portuguese water dogs carried the small dog IGF 
gene allele, whereas the large Portuguese water dogs 
carried the large dog IGF gene allele. This result was 
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FIGURE 6.10 Model plants 
include mustard weed and corn. 
(A) Arabidopsis flowers. (B) The 
maize (corn) genome was seque- 
nced by a collaboration of sev- 
eral research groups. 
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supported by a study of 3000 genomes from 143 differ- 
ent dog breeds, which showed that the IGF gene also 
controls height in many other breeds. Perhaps it is sur- 
prising to find that a single gene can have such a large 
impact on the height of dogs, especially since we know 
that mammalian body development is a complicated 
process requiring many different genes and proteins. 
Whereas the IGF gene determines the adult height of 
specific breeds of dogs, other canine genes are involved 
in attenuating the impact of the IGF gene within a 
breed, for example, making one Great Dane grow a lit- 
tle bit taller or shorter than another Great Dane, but all 
Great Danes are very tall dogs. 

In 2005, a team led by geneticist Kerstin Lindblad- 
Toh (Broad Institute of Harvard and MIT) determined the 
DNA sequence of the genome of a female boxer named 
Tasha (Figure 6.11). About 5% of the 2.4 billion base 
pairs of Tasha’s genome DNA are identical to sequences 
in the human and mouse genomes, suggesting that 
these DNA regions might encode genes involved in fun- 
damental biochemical processes that are shared by all 
mammals, such as body plan and development. 

The purebred dogs we know today originated with 
the selective dog breeding programs made popular in 
Europe in the 1800s. Normally evolution works along 
with the environment to select for the survival of the 
“fit” dogs in the wild so that only certain dogs success- 
fully transmit their genes to offspring. However, when 
people began to selectively breed dogs to enrich for 
traits such as short legs or longer snouts, the dogs were 
pampered and protected from the challenges of living 
in the wild, which effectively made them immune to 
the effects of the environment. However, in addition to 
desired traits, this breeding process selected for mutant 
genes that could negatively affect the health of the 
purebred dogs and would probably have been lethal 
in wild animals. These detrimental mutations persist in 
the gene pool of modern purebred dogs in the form of 
genetically inherited diseases and disorders such as hip 
dysplasia in large dogs and blindness in other breeds. 

The DNA changes that alter the characteristics of 
a species or create an entirely new species take place 
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Box 6.1 The Platypus Genome Is Really an Odd Duck 


The strange looking duck-billed platypus resembles a creature 
made by combining characteristics from birds, mammals, and 
reptiles (Figure 6.12). Not only does the platypus have a duck- 
like bill, webbed feet, and lay eggs, but it also sports a fur coat, 
has a tail like a ping-pong paddle, and makes milk to feed its 
young. In the nineteenth century, the platypus was classified as 
a mammal, but unlike typical mammals that have lost their rep- 
tilian features during evolution the male platypus has retained 
the ability to produce venom in the spurs on its hind legs. 

The platypus evolved from a lineage that branched off of 
the mammalian lineage about 166 million years ago, which 
gave rise to animals with features common to both mammals 
and reptiles, such as the ability to lay eggs. Two of these egg- 
laying mammals, called monotremes, still exist today, although 
most would agree that the “family resemblance” between the 
platypus and the echidna (spiny anteater) is difficult to appre- 
ciate (Figure 6.12). The unusual traits of the platypus raised 
important questions about the organization and evolution of 
the platypus genome. 

An international group of scientists funded by the NHGRI 
analyzed the genome DNA from a female platypus from 
Australia named Glennie. Analysis of the platypus genome 
sequence proved to be very difficult because about 50% of 
the platypus genome is made up of repeated DNA sequences, 
which make it difficult to identify the original positions of the 


over millions of years and occur in short, rapid evolu- 
tionary bursts that accompany an increase in the accu- 
mulation of DNA mutations in the genome. Studies 
in dogs show that changes in the repeated DNA 
sequences located in key genes are correlated with cer- 
tain characteristic traits. Repeated DNA sequences are 
short sections of DNA sequence that are repeated many 
times in tandem in the genome. Scientists studied the 
tandem DNA repeats in the genomes of 92 dog breeds 
and found a strong connection between the number 
of tandem DNA repeats in a particular gene and the 
function of the protein product expressed from that 
gene. The diverse characteristics common to different 
breeds of dogs is linked to the rapid accumulation of 
tandem DNA repeats at specific locations in the canine 
genome. 


repeated DNA sequences in the native genome. The scientists 
found that the platypus genome contains about 18,500 genes 
that are distributed over 52 chromosomes, including some 
large chromosomes, many small chromosomes, and 10 sex 
chromosomes. Interestingly the DNA sequences in the platypus 
X chromosome are similar to the DNA sequences in a sex chro- 
mosome found in birds, in support of an evolutionary relation- 
ship between the platypus and birds. 

Scientists found additional evidence that that the peculiar 
mix of platypus traits is reflected in the DNA of the platypus 
genome, including reptile-like genes that direct venom pro- 
duction, but evolved independently in modern venomous 
reptiles. The platypus genome encodes genes required for egg 
laying and milk production (lactation). The female platypus 
makes nutritious milk containing sugars, fats, calcium, and 
some milk proteins that closely resemble milk proteins made 
by mammals. These features suggest that the sophisticated 
lactation processes in humans and platypuses evolved in 
mammals before the Jurassic period. 

The platypus and the echidna both descended from 
the most basic form of very early mammal and the platypus 
genome shares about 82% of its gene sequences with the 
echidna, human, mouse, and dog genomes. This is not surpris- 
ing as all eukaryotes are made of cells with similar genomes 
that encode the proteins needed to make a functional cell. 


FIGURE 6.12 The egg-laying mam- 
mals, the platypus and echidna. 
(A) The platypus is a duck-billed, web- 
footed, egg-laying, venomous mam- 
mal. (B) The echidna (spiny anteater) 
is also an egg-laying mammal. 


Tandem repeat mutations often result from mistakes 
made when the DNA genome duplicates (replication); 
sometimes the enzyme copying the genome DNA 
accidentally inserts a few extra DNA base pairs into 
the genome, which amplify into many tandem repeats 
with successive genome replication. Tandem DNA 
repeats are common in the genomes of most species; 
in humans DNA repeats play a key role in Huntington’s 
disease (see Chapter 10). 


The selective breeding of dogs for many generations influ- 
enced the evolution of the dog genome in ways that differ 
from standard evolutionary processes. Further research will 
investigate the mechanisms responsible for the rapid evolu- 
tion of our canine friends. 
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EARLY HUMAN GENOME MAPS 


Human Gene Linkage Maps 


Before automated DNA sequence analysis was rou- 
tinely available in the 1990s, maps depicting the 
human chromosomes were drawn by geneticists using 
genetic information to determine the relative locations 
of genes and other landmarks along a DNA molecule 
(Figure 6.13). These gene linkage maps show the genes 
on the chromosome DNA based on the frequency of 
recombination events that occur between genes when 
the cells go through meiotic cell division. Meiosis is the 
process whereby cells reduce the number of chromo- 
somes by half, to produce germ cells like sperm and 
eggs. Recombination, also called crossing over, takes 
place when DNA sequences are physically exchanged 
between two chromosomes. This process can involve 
the exchange of only a few base pairs or many thou- 
sands of DNA base pairs (Figure 6.14). Recombination 
is a normal genetic process that purposefully “shuffles 
genes” during cell division, which contributes to genetic 
variation and helps to ensure that each human being 
inherits a unique human genome (except for identical 
twins). 
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FIGURE 6.13 Navigate the human genome using chromosome 
maps. The banded condensed chromosome at the top shows the 
constricted centromere region and displays the banding patterns 
along the chromosome arms, visible after staining for cytology. 
Chromosome maps show the relative positions of chromosome land- 
marks (centromere, telomere, etc.) and genes on the DNA molecule, 
which extends from one end of the chromosome to the other end. 
Genetic maps or gene linkage maps are derived from the results of 
meiotic recombination studies, which reveal the relative positions 
of the genes on each chromosome DNA helix. Physical maps are 
based on the most updated sequence analysis of the chromosome 
DNA (often from sequence analysis of overlapping genomic DNA 
fragments). The physical map of a chromosome is at a very high 
resolution and is based on the DNA sequence of the chromosome. 
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The distance between two genes on the linear 
DNA in a chromosome is related to the frequency of 
meiotic recombination events that occur between the 
two genes. If the two genes are physically located 
close together on the chromosome, then the length of 
the DNA between the genes is too short to undergo 
frequent DNA exchanges, resulting in a low recom- 
bination frequency. In this case these adjacent genes 
are said to be “linked” and are almost always inher- 
ited together by individual offspring. In contrast, genes 
that are located far apart on the chromosome DNA 
undergo frequent recombination exchange events and 
as a result are inherited as independent, “unlinked” 
genes. The genetic distance determined from the 
frequency of recombination between two genes is 
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FIGURE 6.14 Genetic linkage maps are based on recombination 
events (genetic recombination). (A) Two homologous chromosomes 
are shown, one dark blue, one light blue; each chromosome con- 
tains two chromatids. The light blue and dark blue chromosomes 
pair side-by-side during meiosis. The dark blue chromatids carry the 
normal (wildtype) gene alleles: A, B, and C. The light blue chroma- 
tid DNA carries the mutant versions (alleles) of the same genes: a, 
b, and c (shown in lowercase). When DNA exchange events (red 
dotted line) occur between chromosomes with very similar DNA 
sequences, the dark and light blue chromatids can exchange DNA 
helices (box with dotted lines). (B) After the dark blue and light blue 
chromatids exchange DNA (box with dotted lines), the order of 
the A, B, and C genes on the chromosome is not changed but the 
DNA strands encoding the normal wildtype gene (C) and the mutant 
(c) gene have been swapped by exchanging the DNA between 
chromosomes. The normal A, B, and C alleles start out on the dark 
blue chromatids, with the mutant a, b, and c alleles on the light blue 
chromatids. After the crossover event, the wildtype C allele (dark 
blue chromatid) has swapped positions with the mutant c allele 
(light blue chromatid), linking the light blue a and b alleles to the 
dark blue wildtype C gene. The other recombined chromatid con- 
tains the a and b alleles linked to the dark blue C allele. 
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measured in centimorgans (cM), named for the famous 
early geneticist Thomas Hunt Morgan. A meiotic 
recombination frequency of 1% means that the two 
genes in question are tightly linked on the chromosome 
and are inherited together in 99% of all meiotic cell 
divisions. By definition, these genes are separated on 
the chromosome by a genetic distance of 1cM, which 
is equivalent to a DNA distance of about 1 million base 
pairs. 


Down to Details: Physical Maps of the 
Human Genome 


Physical chromosome maps are often based on known 
DNA sequences and accurately depict the positions of 
genes along the DNA at very high resolution (Figure 
6.14). In 1992, a team of 35 coauthors published the 
first physical map of the smallest human chromosome 
21, predicting the locations for genes involved in amyo- 
trophic lateral sclerosis (Lou Gehrig’s disease), epilepsy, 
and Alzheimer’s disease. 

A real milestone was reached in 1999 when scien- 
tists completed the sequence of the first human chro- 
mosome (chromosome 22); later that year researchers 
celebrated sequencing the billion(th) base pair of human 
genome DNA. David Page and colleagues (Whitehead 
Institute for Biomedical Research) published the first 
high-resolution physical map and sequence of the 
smallest human chromosome, the Y sex chromosome, 
in Science magazine (Figure 6.15). 


FIGURE 6.15 Human X and Y sex chromosomes. Condensed 
mitotic human sex chromosomes are shown; X (left) and Y (right). 
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Early maps showed the positions of genes on the chromo- 
somes but contained little information about the sequence 
or molecular characteristics of the human genome. This 
changed with the development of rapid, automated DNA 
technologies that were essential to the huge success of the 
Human Genome Project. 


DETERMINING THE DNA SEQUENCE OF 
THE ENTIRE HUMAN GENOME 


History of the Human Genome Project 
(HGP) 


Officially the Human Genome Project (HGP) was a 
13-year (1990-2003) research effort staffed by U.S. 
and international scientists with the goal of determin- 
ing the DNA sequence (the exact order of the DNA 
bases) of the entire human genome. This amazing feat 
required the combined skills and dedicated coopera- 
tion of hundreds of research scientists working in pri- 
vate and public laboratories around the world. In the 
end, the public- and private-sector corporate scientists 
published the human genome sequence at the same 
time in two top scientific journals. 

The first documented proposal to sequence the 
human genome DNA was made by Robert Sinsheimer 
in 1985. Many noted scientists were strong advocates of 
the HGP, including Nobel Laureates Walter Gilbert and 
Paul Berg. The Human Genome Organization (HUGO) 
was established in 1988 to begin to organize genome 
sequencing efforts, followed by the first annual con- 
ference on human genome mapping and sequencing 
held at the Cold Spring Harbor Laboratory, Cold Spring 
Harbor, New York. Because of extensive government red 
tape and funding issues, the HGP was not fully under- 
way until 1990. 


Public Support and Free Access to the 
HGP DNA Sequence 


During the 1990s, many established and new corpora- 
tions moved into the biotechnology arena and began 
to participate directly in human genome research. The 
fact that most of the U.S. funding for Human Genome 
Project came from taxpayer funds earned the HGP the 
nickname “the public human genome project,” which 
easily distinguished the taxpayer-funded public HGP 
from the corporate-sponsored “for-profit private-sector 
HGP.” The international race to sequence the human 
genome promoted stiff competition between these two 
groups, the consortium of publicly funded HGP scien- 
tists and the privately funded research sponsored by 
the biotechnology companies. 
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James D. Watson, co-discoverer of the DNA 
double helix, was appointed to be the first director 
of the public HGP (see Chapter 2). Watson reassured 
the public, Congress, and the scientific community 
that credible science, and not politics, would be the 
primary focus of the taxpayer-funded HGP research. 
Watson also committed HGP funds to support stud- 
ies on the ethical, legal, and social issues (ELSI) that 
arise as a result of human genome research. In 1994, 
the HGP ELSI Working Group supported the federal 
Genetic Privacy Act, legislation designed to regulate 
the collection, analysis, storage, and use of human 
DNA samples and personal genetic information. 
ELSI established free online access to DNA informa- 
tion resulting from HGP research, but the human 
DNA information generated by the private research 
corporations was not accessible to the public without 
charge. ELSI continues to support programs to edu- 
cate the U.S. public about the human genome and the 
impact of the human genome sequence information 
on society. 

The next HGP director, Francis Collins, realized 
that the private biotechnology companies were fast 
becoming much more competitive in the race to 
sequence the human genome (Figure 6.16). Collins 
successfully refocused the goals of the public HGP and 
made important changes that dramatically increased 
the productivity of the public research efforts. The 
technologies used in public HGP labs were modified 
to take advantage of the improved DNA sequencing 
and cloning methods used by the private labs. 


(A) 
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Techniques to sequence DNA were first developed 
many years before scientists conceived of sequencing 
the entire human genome. 

In the 1970s and 1980s, DNA sequencing experi- 
ments were performed by hand in the lab, which 
required the use of methods that were expensive, labor 


FIGURE 6.16 Francis Collins, director of the HGP, refocused the pub- 
lic HGP goals. As director, Collins made changes that increased the pro- 
ductivity of the public research efforts to sequence the human genome. 


FIGURE 6.17 Before DNA sequencing became automated, DNA sequence analysis was performed “by hand” and read “by eye.” 
(A) Scientists use a pipetting device to add a small DNA sample into the well of a gel. During electrophoresis, the DNA molecules separate in 
the gel according to their length. The longer DNA strands migrate slowly and are near the top of the gel, and the shorter DNA strands migrate 
quickly in the gel and run near the bottom of the gel (see Chapter 3). (B) The DNA strands produced in the sequencing reactions are sepa- 
rated on a gel, and the bands of DNA are detected by exposing the gel to an X-ray film. The order of the DNA bands in the four sequencing 


lanes indicates the DNA sequence (see Chapter 3). 
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intensive, and required using dangerous chemicals 
and radioactive reagents (Figure 6.17). However, by 
the mid-1980s the Sanger chain termination method 
of DNA sequence analysis, named after Fred Sanger, 
who developed the technique and won the Nobel 
Prize for science and medicine twice, was fast becom- 
ing the method of choice everywhere. The Sanger 
chain termination method of DNA sequence analysis 
is an ingenious experimental approach that utilizes the 
same enzyme that is used by cells to make copies of 
DNA. When a DNA strand is copied or replicated, the 
enzyme adds nucleotide building blocks to the new 
growing strand of DNA (see Chapter 2). Sanger used 
four substitute “dideoxy-” nucleotides to the reac- 
tion mix, which were different from the normal build- 
ing blocks because the dideoxy- nucleotide building 
blocks lack an essential chemical group required for 
DNA synthesis, the 3’ hydroxyl group (—OH) (Figure 
6.18). The dideoxy-DNA building block causes the 
DNA strand to stop growing (terminate) at a specific 
spot on the DNA template strand. Analysis of the col- 
lection of terminated DNA strands will reveal the 
sequence or order of the bases in the DNA strand. The 
quality of DNA sequence analysis depends on hav- 
ing a good DNA template strand to be copied by the 
enzyme in the sequencing reactions. 

Early studies on human genes required that the 
individual genes be painstakingly identified by years 
of research using genetic, biochemical, and molecular 
methods to find and clone the specific DNA of interest. 
At that time researchers studying human genes needed 
enough information to be able to determine the loca- 
tion of a single gene in the human genome, even 
though they did not know the DNA sequence of the 
gene or even the chromosome location of the gene. For 
example, the search for the specific human gene that 
causes Huntington’s disease involved genetic studies 
on families with this inherited disease (see Chapter 10). 

Scientists developed a method called shotgun 
DNA cloning to permit easy preparation of template 
DNA for sequencing analysis. Shotgun cloning and 
Sanger chain termination (dideoxy-) DNA sequenc- 
ing became standard approaches used together to 
determine the sequence of very large DNA molecules 
including genomes (Figure 6.19). The long genome 
double-stranded DNA molecules are cut into shorter 
DNA fragments, which are then inserted, or cloned, 
into plasmid vectors (see Chapters 4 and 5). From then 
on the plasmids will carry the inserted DNA fragments 
and will replicate as part of the plasmid DNA mole- 
cules. The scientists then use the Sanger chain termi- 
nation method to sequence the foreign DNA inserted 
into the vectors. The DNA sequence information is sent 
to the computer programs for analysis (Figure 6.20). To 
actually reassemble an entire genome sequence, the 
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scientists must piece the sequenced DNA fragments 
back together, bit by bit, until they have reconstructed 
the original genome sequence. Computers play an 
essential role in this entire process. 

The Sanger chain termination method is very well 
suited for use in automated DNA sequencing technol- 
ogy. But even the best DNA sequence method yields 
only several hundred bases of readable DNA sequence 
per sequencing reaction. 
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FIGURE 6.18 Dideoxy DNA-building blocks stop DNA synthe- 
sis. (A) For DNA synthesis to begin, a short DNA primer must be 
base paired to the template DNA strand. The DNA primer contains a 
“free” 3' OH (hydroxyl) group that is required to add the next build- 
ing block during synthesis of the new DNA strand. (B) When a dide- 
oxynucleotide chain terminator (circled) is added to the 3’ end of 
the primer, the lack of the 3’ OH means that DNA synthesis must 
stop (chain termination) because the polymerase enzyme cannot 
add to the primer. (C) The differences between the deoxy (/eft) and 
dideoxy (right) nucleotides are fundamental to chain termination 
DNA sequencing. 
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FIGURE 6.19 Shotgun cloning used to sequence DNA genomes. (A) Shotgun DNA cloning and sequencing is an efficient way to determine the 
sequence of large genomes like the human genome. (B) The random fragments are cloned randomly into plasmid vectors, without identifying 
or characterizing the DNA fragments until after DNA sequence analysis is complete. (C) The recombinant plasmids are introduced into bacte- 
rial cells where they replicate to provide enough cloned human genome DNA for sequencing. More recently, polymerase chain reaction (PCR) 
permits scientists to skip the cloning steps entirely and directly sequence the genome DNA itself (see Chapter 5). (D) The human DNA fragment 
(blue line) is cloned into a plasmid vector (dotted line). (E) The double-stranded human DNA is cut into shorter double-stranded DNA fragments. 
(F) Each individual DNA sequencing reaction (dotted box) contains a DNA template to be sequenced (light blue line) and a DNA primer (green 
arrow) that is complementary to one region of the DNA fragment. In a separate reaction, a different primer (red arrow) is used to determine the 
sequence of the opposite strand of that DNA fragment. (G) Many different short primers (purple arrows) are used to determine the sequences 
of all of the DNA strands (dark blue lines). The results from sequencing are sent to computers that analyze and align the DNA sequences and 
reconstruct the sequence of the original cloned DNA fragment, eventually revealing the sequence of an entire human chromosome. 
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FIGURE 6.20 Automated DNA sequence analysis requires a DNA sequencing machine. This DNA sequencing machine automatically per- 
forms the chemical reactions, resolves the DNA products of the reactions on a gel or using a capillary system, and sends the DNA sequence 
information directly to the computer. 
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FIGURE 6.21 
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Decoding the human book of life. (A) Craig Venter and Francis Collins were interviewed in the popular press, including Time 


magazine (2000); (B) Craig Venter, President Bill Clinton, and Francis Collins; (C) the public and private Human Genome Projects DNA 
sequences were published in two scientific journals at the same time. Celera and Craig Venter published in Science, whereas the International 


Consortium and Francis Collins published in Nature. 


In 1988, J. Craig Venter and Mike Hunkapillar 
founded Celera Genomics, which focused on develop- 
ing the sophisticated techniques needed to manipulate 
and sequence large genomes. They built the first DNA 
sequencing instruments and pioneered the new tech- 
nologies needed for automated DNA sequencing. Many 
scientists were skeptical of Venter’s focus on automated 
DNA sequencing analysis, including James Watson, 
co-discoverer of the structure of the DNA helix (see 
Chapter 2), who was then director of the public Human 
Genome Project. Watson challenged Venter’s priorities 
in public and Venter responded by stating that Celera 
Genomics would not only finish a draft DNA sequence 
of the human genome by 2001, but they would spend 
only $200 million to complete the project. With this 
exchange the race between the public and the private 
human genome scientists to sequence the entire human 
genome began in earnest. 

At Celera Genomics Venter and Hunkapillar built a 
huge DNA computer facility the size of a football field 
to analyze DNA sequence information, which was at 
the time the largest civilian supercomputer in existence. 
They outfitted the Celera labs with hundreds of newly 
developed automated DNA sequencing machines that 
transferred the DNA sequence information directly 
to the computer (Figure 6.20). In time the revolution- 
ary technological advances made by the Celera scien- 
tists were adopted by most Human Genome Project 
research groups, both private and public, which had an 
enormous positive impact on the progress of the work 
to complete the human genome sequence. 

As DNA sequence analysis became automated and 
much less expensive, many biotechnology companies 
began to offer DNA sequencing services in addition 
to customized recombinant DNA cloning and gene 


library services. Today modern research labs usually 
send DNA out to companies to be sequenced or use 
DNA sequencing kits, which provide the reagents, con- 
trols, buffers, and the necessary enzymes to perform 
DNA sequencing reactions in research labs and in 
many teaching labs as well. Some biotechnology com- 
panies have focused on developing therapeutic agents 
derived from the results of human genome research, 
while others have worked on the commercial compu- 
ter software used to construct and maintain DNA and 
protein sequence databases (see Chapter 7). 

In 2001 nearly fifty years after the discovery of the 
DNA double helix, Francis Collins and Craig Venter, 
the leaders of the public and private HGP groups, 
announced the release of the first version of the human 
genome DNA sequence, ahead of schedule and under 
budget. The two competing research groups agreed to 
publish their research simultaneously in two equally 
prestigious scientific journals, the British Nature and 
the American Science (Figure 6.21). 


WHAT WE LEARNED FROM THE HUMAN 
GENOME SEQUENCE 


Only 2% of Human Genome DNA Codes 
for Proteins 


A new human DNA genome is made when the zygote 
inherits a complete genome complement of 46 chro- 
mosomes, 23 chromosomes from Mom and 23 chro- 
mosomes from Dad (Figure 6.22). How similar are the 
DNA sequences of the two different versions of the 
same human chromosomes, for example, chromosome 
21 from Mom and chromosome 21 from Dad? Since 


140 


Box 6.2 Venter Is Out to Rescue the Earth’s Biodiversity 


DNA and Biotechnology 


Human genome scientist J. Craig Venter has embarked on 
an amazing project to save the world from the crisis caused 
by the rapid depletion of the Earth’s genetic biodiversity. 
Unfortunately, scientists know surprisingly little about the vast 
numbers of life forms on Earth, especially the diverse micro- 
organisms living in the world’s oceans. This wealth of biodi- 
versity holds many genetic secrets and has great potential. 
For example, completely unknown plants or microorganisms 
could offer new sources of “green,” recyclable energy, provide 
new pharmaceutical medicines, and present new solutions 
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FIGURE 6.22 A new human DNA genome is created when the 
human zygote inherits a complete set of chromosomes from Mom 
and Dad. (A) Mom has one copy of each chromosome in each egg 
cell, including chromosome 3; meiosis cell division reduces the 
number of chromosomes when producing egg and sperm cells. (B) 
Dad has one copy of each chromosome in each sperm cell, including 
chromosome 3, meiosis cell division reduces the number of chromo- 
somes when producing egg and sperm cells. (C) The DNA in chromo- 
some 3 from Mom and the DNA in chromosome 3 from Dad have 
similar sequences, but the genomes are not identical; Mom and Dad's 
chromosome 3s encode different alleles of the A, B, and C genes. The 
dark blue chromosome 3 DNA carries the normal (wildtype) A, B, 
and C genes. The light blue chromosome 3 DNA carries the mutant 
alleles of the a,b, and c genes. (D) When the zygote begins to divide 
by mitosis to make more cells, all of the cellular components, includ- 
ing the chromosome DNA, are duplicated to prepare for the new 
cells. Mitosis maintains the number of chromosomes in the cells, 
whereas meiosis reduces the number of chromosomes by half. 


to the climate crisis. Unfortunately, most of these unknown 
organisms are rapidly becoming extinct and once gone, sci- 
entists will no longer have access to that genetic information. 
Venter decided to approach this problem by collecting sam- 
ples from the ocean and cataloging thousands of new genes 
isolated from organisms found in the deep. Venter collected 
the samples while circling the globe on his yacht Sorcerer 
Il and then sent the captured sea life to the J. Craig Venter 
Institute (JCVI) in Rockville, Maryland, for DNA sequencing 
and other analyses. 


the first draft of the human genome sequence was 
released in 2001, the genome sequence was updated 
and in 2008 the human genome contained 3165 mil- 
lion (3.2 billion) base pairs of DNA carried on 23 
chromosomes. Most human cells (except sperm and 
egg) contain 23 chromosome pairs (46 chromosomes), 
carrying 6.2 billion bp of DNA in each nucleus. 

The human genome contains noncoding DNA, 
regions of the genome DNA that are not copied into 
RNA and do not code for proteins. Still it was a big 
surprise to learn that about 98% of the human genome 
does not code for proteins expressed in the cells. 

All of the genes needed to make a human being 
are encoded in only 2% of the human genome DNA. 
This means that only a relatively small number of DNA 
base pairs in the genome carry the instructions for a 
human master plan. 


How Many Human Genes Are in the Human 
Genome DNA? 


For years scientists have estimated the total numbers of 
genes in the genomes of various organisms using differ- 
ent methods, but accurate numbers are still often diffi- 
cult to find, even though genome sequencing projects 
are now routine. Part of the problem is that the defini- 
tion of a “gene” has changed as scientists learned more 
about the structure and function of different types of 
genes. It was especially important to include the fact 
that the majority of human (and most other eukaryotic) 
genes contain intron and exon sequences (Figure 6.23). 
This complicates the issue because in many cases the 
exon of one gene can serve as the intron for a differ- 
ent gene. In addition, one DNA coding region in the 
genome can serve as the template for RNA transcripts, 
which are processed in such a way that they can actu- 
ally code for a number of different proteins. DNA 
sequence analysis shows that the budding yeast genome 
encodes 5,770 genes, the roundworm contains 19,427 
genes, and the fruit fly genome has 13,379 genes. 
It seems entirely reasonable that humans should need 
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Most human genes contain introns and exons encoded in the genome DNA. (1) When the gene is expressed in the cell, the DNA 


is first copied into a long precursor RNA transcript. (2) This precursor RNA contains the introns and exons in the coding region, which dictates the 
order of amino acids in the new protein chain. (3) Both exons and introns are included in the long precursor RNA transcript, but RNA splicing 
enzymes remove the introns from the precursor RNA to make a messenger RNA. (4) The final messenger RNA (mRNA) sequence contains only the 
exons precisely linked together to resurrect the complete protein coding region and is ready for transport to the cytoplasm and translation. 


a significantly larger number of genes than single-celled 
yeast or fruit flies (Figures 6.8-6.11). 

Before the start of the HGP, most people estimated 
the total number of human genes at between 80,000 
and 100,000. So imagine the surprise in 2001 when 
the HGP scientists announced that the human DNA 
genome encodes only between 30,000 and 35,000 
genes. The recent updates of the human genome DNA 
sequence provided a much more accurate picture of 
the size and complexity of the human genome. In 
2008 the total number of different human genes was 
estimated at between 20,000 and 25,000. 

Many types of eukaryotic cells, including human 
cells, use the process of RNA splicing to permit cells 
to express more than one protein product from a sin- 
gle gene (see Chapter 3). The genome DNA containing 
the intron and exon sequences in a gene are copied 
into long precursor RNAs containing both introns and 
exons (Figure 6.23). RNA splicing enzymes remove the 
introns and produce final mRNAs containing only the 
exons needed to be translated into an amino acid pro- 
tein chain (see Chapter 3). 

Differently spliced RNAs copied from the same 
gene can carry different RNA information that can be 
translated into different proteins. This gene expression 
strategy increases the number of different proteins 
available to the cell without substantially increasing the 
total number of genes encoded by the genome. 

An average human gene occupies about 3000 bp 
(3kb) of genome DNA, but because the intron 


sequences vary considerably in length, almost all 
human protein-coding genes are significantly longer 
than the actual protein-coding region of the genome. 
At this time the largest known human gene encodes 
the dystrophin protein. Mutant dystrophin proteins 
cause muscular dystrophy (see Chapter 10). The dys- 
trophin gene covers 2.4 million bp of genome DNA, 
including many exon and intron sequences. The long 
precursor RNA copied from the dystrophin gene is 
spliced to remove the introns, which converts each 
dystrophin RNA precursor into a much shorter messen- 
ger RNA (mRNA) containing only the exons needed to 
encode the dystrophin protein. 


Most protein-coding genes in the human genome are 
significantly longer than needed to account for the pro- 
tein product of the gene. This is explained because the 
genes in the genome DNA contain intron as well as exon 
sequences. 


Scientists have long studied genes that are copied 
into RNA but are never translated into proteins; instead 
these RNA molecules perform important functions in 
the cell as RNA strands, not proteins. For example, 
transfer RNA genes code for the transfer RNA (tRNA) 
molecules that carry specific amino acids to the ribos- 
omes during protein synthesis (see Chapter 3). Another 
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example is ribosomal RNA (rRNA) genes that encode 
small rRNA products. These rRNAs form extensive 
base-paired regions and single-stranded loop regions 
that function in ribosome assembly and protein syn- 
thesis (Figure 6.24). Discovered more recently are 
the small nuclear RNA (snRNA), small nucleolar RNA 
(snoRNA), and other small RNA genes, which pro- 
duce a variety of short RNAs that function in a long 
list of essential cellular processes in addition to RNA 
splicing. 

The human genome DNA sequence gave scien- 
tists the first detailed view of how genes are organized 
along the chromosomes. The sequence shows that the 
protein-coding genes are not evenly distributed along 
the human chromosome DNA. Instead the genes are 
arranged in random gene clusters separated on the 


FIGURE 6.24 Base-paired secondary structure of ribosomal RNA 
(rRNA). The secondary structure of this rRNA forms a complex pat- 
tern of short double-stranded stems and unpaired single-stranded 
loops and bubbles. The structure shown is a continuous single strand 
of RNA, starting with the 5’ end and finishing at the 3’ end. The 
rRNA folds into a structure with other rRNAs and many special pro- 
teins to make a ribosome. 
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chromosome from other gene clusters by vast regions 
of noncoding DNA. Regions with clusters of human 
genes are often flanked by sections of noncoding DNA 
containing CG base pairs repeated over 30,000 times 
[-CGCGCGCGCGCG-], which are called CpG islands. 
It is possible that the CpG islands in the genome form 
transcription barriers to prevent the RNA polymerase 
enzymes from copying past the end of a gene and into 
the flanking noncoding DNA. 


NINETY-EIGHT PERCENT OF THE HUMAN 
GENOME IS NONCODING DNA 


Jumping DNA Elements Move around the 
Genome 


The human genome DNA sequences that code for 
human proteins were the focus of intense study for 
years before the HGP began. With the completion of 
the human genome sequence scientists have increased 
research on the more mysterious noncoding DNA 
sequences that make up the remaining 98% of the 
human genome DNA. 

Believe it or not, about half of the noncoding DNA 
in the human genome contains transposable DNA 
elements that can physically move (jump or trans- 
locate) from one DNA location to another site in the 
human chromosome DNA (Figure 6.25). This amazing 
behavior is not restricted to the human genome. Many 
different mobile DNA elements have been discovered 
in prokaryotes and other eukaryotes. These transpo- 
sable DNA elements move around the genome using 
different molecular mechanisms. Sometimes transpos- 
able elements are called “selfish DNA” because these 
DNA elements act like parasites that are reproduced 
along with the organism’s genome. Some transposable 
DNA elements have their own genes that also move 
when the DNA element moves. Transposable elements 
sometimes insert into active genes that are transcribed, 


New copy of 
transposon 
Insertion 


Mobile transposon 


FIGURE 6.25 Transposable DNA elements can jump around the genome. Some transposons can move from one location in the genome to 
another site in the genome; some use a “copy, cut, and paste” mechanism. 
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and they can disrupt gene expression. Fortunately, 
transposable elements rarely insert into active genes in 
the human genome. 


Repeated DNAs Are Very Common 
Genome Sequences 


About half of the noncoding DNA in the human 
genome is made up of repeated DNA sequences; the 
human genome has more repeated DNA than the 
fruit fly (3%), worm (7%), and mustard weed (11%) 
genomes. Examples include very short satellite DNA 
sequences that are repeated thousands of times in tan- 
dem in the human genome. Highly repeated short DNA 
sequences are commonly located near the telomeres 
and the centromeres of most eukaryotic chromosomes 
(Figure 6.26). DNA control elements in the genome, 
which regulate gene expression and control the cell 
cycle, occupy a small fraction of the overall human 
genome DNA. These control elements include tran- 
scriptional promoter and enhancer DNA elements that 
regulate gene expression and “DNA replication origin” 
elements involved in the initiation of DNA replication 
and chromosome duplication. DNA control elements 
are located at many critical positions in the genome 
where they influence cell functions by binding to spe- 
cial regulatory proteins made in the cell. 


INDIVIDUAL GENOMES AND GENETIC 
VARIATION 


Genetic Variation in the Human Genome 


The release of the first version of the human genome 
sequence promoted researchers to find out more 
about the sequences of individual human genomes. 


(A) 


143 


The development of new DNA technologies and rapid 
automated DNA sequencing machines made it possi- 
ble to determine the DNA sequences of hundreds of 
human genomes since 2001. DNA sequencing analy- 
sis proves that each person carries a unique genome 
DNA sequence (see Chapter 8). Except in the case of 
identical genetic twins, an individual’s genome DNA 
sequence is detectably different from the sequences of 
all the other human genomes on earth. 

Initial sequence comparison studies suggest that 
the DNA sequences of different human genomes are 
almost identical, 99.9% the same. So no matter how 
different two people might look on the outside, their 
genomes have almost identical DNA sequences. Later 
genome studies show that human genomes might vary 
by as much as 4.5 percent. 

The rare differences between the DNA sequences 
of individual human genomes, called DNA polymor- 
phisms, have become extremely useful genetic mark- 
ers for human genome studies. There are correlations 
between the susceptibility to certain diseases and the 
inheritance of genomes containing specific DNA vari- 
ations. Detecting the small DNA differences between 
human genomes is the basis for the powerful DNA 
fingerprinting technology used in DNA forensics to 
identify suspects (see Chapter 8). 

Restriction fragment length polymorphisms (RFLP, 
pronounced “rif-lip”) were the earliest type of detect- 
able DNA differences used in gene analysis (see 
Chapter 10). RFLP “chromosome markers” occur when 
the DNA of one genome is different at a location that 
is normally recognized in the unchanged genome 
and is cut by a highly specific restriction enzyme. The 
DNA base pair sequence that is altered by the poly- 
morphism prevents the restriction enzyme from cut- 
ting at that specific DNA site in that genome. A DNA 
genome lacking this specific polymorphism can be cut 


FIGURE 6.26 Chromosomes 
with centromeres (yellow) and 
telomeres (light blue). (A) A 
human interphase nucleus (top 
left) is surrounded by many 
metaphase chromosomes with 
the repeated DNA surrounding 
the centromere regions stained 
yellow. (B) The repeated DNAs 
at the ends or telomeres of 
these metaphase human chro- 
mosomes are stained light blue. 
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by the enzyme at that DNA sequence in the genome. 
The RFLP marker at that site in the genome reflects the 
different action of the restriction enzyme, whether or 
not it cuts the different genomes at a specific RFLP site 
in the genome. Cleaving the genome DNA will gener- 
ate DNA fragments of different but predictable lengths 
that can be routinely detected by gel electrophoresis 
and Southern blot hybridization (see Chapter 5). 


Single Base and Copy Number Variations 
in the Human Genome 


Single nucleotide polymorphisms (SNPs) are the single 
base pair differences between the genomes of different 
individuals. The HapMap Project and the Public SNP 
Consortium are cataloging the millions of single base 
differences between individual human genomes. 

Researchers at the Wellcome Trust Sanger Institute 
(Cambridge, United Kingdom) compared the DNA 
sequences of genomes from 270 people from different 
ethnic groups: Yoruba (Nigeria), European descendants 
(United States), Han Chinese (Beijing), and Japanese 
(Tokyo). The team measured the number of copies of 
specific genes (gene copy number) in each genome 
using powerful microarray screening techniques to 
locate copy number variations (CNVs) in the genome 
DNA (see Chapter 13). They identified 1447 cases 
of CNVs, which involved about 12% of the human 
genome DNA, which is a surprisingly large amount 
of gene copy number variation. The 1000 Genomes 
Project began in January 2008, when an international 
consortium of research scientists announced their goal 
of using SNPs to provide a very high-resolution map 
showing the positions of biomedically relevant genetic 
variations in the human genome. This detailed map of 
genetic variation in the human genome will provide an 
important new tool for medical research that will help 
scientists to identify genes that are linked to the differ- 
ent SNPs and other genetic changes and to make the 
data available to the international scientific community 
through free public databases. 

The scientists working on the 1000 Genomes Project 
focused on studying the small fraction of the genome 
DNA that differs between individual human genomes, 
looking for valuable clues to find out why individuals dif- 
fer in their susceptibility to diseases, drug responses, and 
sensitivity to environmental factors. The project started 
with the comprehensive catalog of human genetic vari- 
ations compiled by the International HapMap Project. 
The information in the original HapMap catalog was 
fundamental to the identification of more than 130 
DNA variants that are genetically linked to common 
human diseases. However, the HapMap data are limited 
to detecting only the genetic variants that occur in more 
than 5% of the human genomes. 
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The new technologies of the 1000 Genomes Project can 
detect rare genetic variations in the human genome 
sequences. Scientists continue to search for possible new 
correlations between these rare genome DNA variations 
and observed differences in behaviors, talents, or person- 
alities among individual people. 


HUMAN AND CHIMPANZEE DNA: WHAT 
MAKES US HUMAN? 


Each species has its own unique genome DNA sequence. 
These special genome sequences distinguish between 
different species. Scientists are studying the similarities 
and differences among the genomes of different species 
to try to determine what sets humans apart from every 
other species. Comparisons among eukaryotic genomes 
show that the human genome contains DNA sequences 
that are also found in the genomes of other eukaryotic 
organisms. Some DNA sequence similarities were found 
when the human, monkey, and mouse genomes were 
compared, suggesting that these similar genes might be 
needed for biological processes common to all eukaryo- 
tes. For example, all mammals have genes that control 
the development and location of tissues and organs in 
the body. In terms of body structure, mouse and man are 
actually similar, one head, body trunk, and four append- 
ages. All eukaryotic organisms are made up of cells with 
nuclei and DNA genes that must divide to make more 
tissues for the organism to grow. 


All eukaryotic cells need different enzymes to replicate the 
genome DNA, control the cell cycle, process nutrients, and 
make proteins. These biological processes are directed by 
very similar genes in humans, mice, and bananas. 


Six to eight million years ago, the lineages two 
groups of apes separated and evolved independently 
into the different species, humans and chimpanzees. 
The genomes of modern-day chimps and humans carry 
a record of all of the DNA changes that make humans 
biologically different from chimpanzees, written in the 
DNA sequence. 

Researchers are analyzing the DNA sequences of 
genomes from many different organisms, including 
chimpanzees as well as organisms that are very dif- 
ferent from humans, such as puffer-fish. Initial com- 
parison studies of the human and chimp genomes 
indicated that the chimp and human DNA are about 
98.5% identical, an observation that raised questions 
and much controversy. In 2003, researchers compared 
the DNA sequences of human chromosome 21 with 
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the corresponding chromosome from the nonhuman 
primates, chimpanzee, orangutan, rhesus macaque, 
and the woolly monkey. In all cases, the DNA was 
rearranged much more frequently during the evolution 
of the primate genome than was previously thought, 
and large regions of the human and chimp genomes 
were found to have very different DNA sequences. 
Considering the genetic similarities between chimps 
and humans, understanding the differences between 
the DNA of these two species promises to reveal more 
important information about what makes humans 
human, how we can talk, walk upright, and read. 

How can scientists find the specific DNA differences 
between the human and chimp genomes that might 
identify a genetic variation with an impact on evolution? 
Important clues have come from studying genes that are 
involved in developing characteristics that are distinctly 
human compared to chimps. For example, humans 
typically have larger brains than chimps, suggesting 
that scientists might find functionally important DNA 
differences (polymorphism) between the human and 
chimp genomes if they analyzed the sequences of the 
genes involved in promoting growth of the human brain 
(Figure 6.27). One such potentially important DNA dif- 
ference is located in the MYH16 gene, which is mutated 
in the human genome, but is unchanged in the chimp 
genome, suggesting that this specific DNA change in 
the MYH16 gene could have played an important role 
in the evolution of the human brain. 

The normal MYH16 gene codes for myosin, a mus- 
cle protein that is made in both humans and chimps. A 
mutation affecting MYH16 in humans produces defec- 
tive myosin proteins that actually weaken the jaw mus- 
cles. When muscles change in physical strength, the 
bones attached to the muscles often change as well. 
Over evolutionary time the weak jaw muscles could 
have reshaped the human skull sufficiently to accom- 
modate a larger human brain. Possibly the mutation in 
the human MYH16 gene allowed the subsequent evo- 
lutionary changes to occur. The MYH16 gene mutation 
appeared in the human population about 2.4 million 
years ago, which is about 400,000 years before human 
ancestors developed smaller jaw muscles and the 
human brain started to grow larger. 

We know that humans can talk and chimps cannot 
use language, but scientists do not know when humans 
actually acquired speech. People who inherit a specific 
DNA mutation in the FOXP2 gene have speech difficul- 
ties, indicating that the FOXP2 protein is important for 
human speech. Studies show that two DNA changes 
occurred in the FOXP2 gene between 100,000 and 
200,000 years ago, which was after humans diverged 
from the chimpanzee. Researchers think that these 
changes in the FOXP2 gene have contributed to the 
continual improvement in human speech over the past 
200,000 years. 
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FIGURE 6.27 The human brain is larger than the chimp brain. 
Chimp brain and skull (top) compared to the larger human brain 
and skull (bottom). 


Studies on genetic variations in the genome sequences of 
human populations around the world might reveal impor- 
tant information about human evolution and assist in iden- 
tifying genes involved in human development and human 
diseases. 


WHAT WE STILL NEED TO LEARN ABOUT 
THE HUMAN GENOME 


Despite the great advances made in genome research 
and DNA biotechnology, there is still quite a lot about 
the human genome that remains a mystery. It is impor- 
tant to know much more about how our own genes 
work so we will be better able to develop effective 
treatments for diseases that occur when our genes go 


146 


wrong. The Human Genome Project gave the world 
the first full “printout” of the human DNA master plan, 
but it still left many unanswered questions. 

Scientists continue to work on finding the genome 
locations and biological functions of all the human 
genes. In addition, research is needed to learn more 
about the impact of gene copy number and changes 
in gene regulation when cells are exposed to different 
environments and stress conditions. Researchers will 
also continue to learn more about chromosome organ- 
ization and the relationships between DNA sequences 
and landmark chromosome structures such as centro- 
meres and telomeres. The vast amount of noncoding 
DNA sequences in the human genome continues to 
raise many questions about the distribution, informa- 
tion content, and functional importance of noncoding 
DNA sequences. Of particular interest are DNA ele- 
ments that move or jump around the human genome 
as well as further work on the many types of repeated 
DNA sequences in our DNA. 


Benefits of Future Genome Research 


Future human genome research has much to offer 
biomedical science with the successful correlation of 
SNPs (single-base DNA variations among individual 
genomes) with specific inherited diseases and traits 
such as disease-susceptibility, and diverse individual 
reactions to drugs and other treatments. Scientists are 
using many approaches including molecular genetics 
to take advantage of the wealth of information gener- 
ated by the HGP to develop new drugs, some of which 
are custom-designed drugs based on the genetic profile 
of specific individual patients (personalized medicine; 
see Chapter 13). In the future, scientists will be able to 
routinely evaluate the health risks associated with envi- 
ronmental exposure to radiation, chemicals, and toxins. 
Large-scale genome sequencing and comparison 
studies on thousands of individual human genomes 
continue to reveal more about the inheritance of genes 
that control complex traits and the specific proteins 
involved in multigene diseases. Genome-wide gene 
expression profiles of human cells contribute to our 
understanding of the genes that control cell and tissue 
differentiation and cancer cell development. Studies 
show that specific epigenetic changes made to the 
human genome at fertilization are responsible for the 
development of embryonic stem cells into various adult 
cell types. These changes also play a key role in the 
development of new stems cells that resemble embry- 
onic stem cells, but which were developed from adult 
cells, not embryonic stem cells (see Chapter 12). 
Microbial genomics research is increasingly impor- 
tant for our understanding of deadly pathogens and 
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ways to rapidly detect and treat diseases caused by these 
microbes. The impending threat of global warming and 
the increases in oil and gas prices have focused public 
support on research to develop new sources of “green” 
sustainable energy, as well as better ways to detect and 
monitor environmental pollutants. Microbe-based meth- 
ods are under development that will safely and effi- 
ciently clean up toxic waste spills, which would protect 
the environment and save billions of dollars in future 
toxic cleanup costs. 


Genome DNA studies (genomics) will continue to reveal 
new insights into evolution, whereas the comparison of 
genome and protein sequences will tell us more about evo- 
lutionary conservation of protein structure and function. 


SUMMARY 


In this chapter we learned about the Human Genome 
Project, including some of the amazing people involved 
and the secrets revealed about human DNA and genes. 
The adult human body contains more than 220 differ- 
ent types of cells, and each cell contains chromosomes 
that carry human genome DNA. All of the cells in the 
same body contain the same genome DNA sequence. 
The genetic instructions in the human genome are writ- 
ten in a DNA language that conveys the genetic infor- 
mation needed to make specific proteins that build and 
maintain each individual human being. 

The goal of the international Human Genome Project 
was to determine the DNA sequence of the entire 
“human genome,” the collection of 46 chromosomes 
that is created when an egg is fertilized by a sperm. 
Research on genomes other than the human genome 
predated the HGP and contributed in many ways to the 
eventual success of the HGP. Studies on small genomes 
helped scientists to develop the technological advances 
required to determine the DNA sequence of very large 
DNA genomes. 

The HGP research revealed surprising information 
about the organization of DNA sequences in the human 
genome. Possibly most intriguing is the finding that only 
2% of the human DNA actually codes for protein genes. 
The rest of the human genome contains large amounts 
of repeated DNA sequences and transposable DNA 
elements that have the ability to move around to differ- 
ent sites in the chromosome DNA. The human genome 
sequence information has had a huge impact on the 
development of genetic approaches to the diagnosis and 
treatment of human diseases and disorders. The success 
of the Human Genome Project gave the entire world 
access to the human master plan, the genetic information 
needed to create and build a living human being. 
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REVIEW 


1. Explain which advances in DNA technology and 
instrumentation made the biggest impact on the 
success of the Human Genome Project. 

2. Describe the most surprising discovery revealed 
by the human genome DNA sequence. 

3. Describe what the DNA sequence of the dog 
genome revealed about how dogs evolved. 

4. Explain the basis for the legal, social, and ethical 
issues that were raised by the Human Genome 
Project research. 

5. Describe how geneticists create gene linkage 
maps and physical maps of chromosomes in the 
human genome. 

6. Summarize the process by which the sequence of 
DNA bases is determined using the Sanger chain 
termination method. 

7. Identify some genomes that have already been 
sequenced and explain what surprising facts were 
revealed by the new DNA genome sequence. 

8. Describe the properties of non-coding DNA and 
explain the impact of this finding on the structure 
and function of the human genome. 

9. Explain why genetic variation plays an important 
role in understanding genes involved in human 
diseases. 
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10. Some noncoding DNA in the human genome 
contains elements that can move in the genome. 
Explain how this is accomplished. 
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Warfarin Dosing Based on Genetics 


Genetic Testing Cited for Blood Thinner 


Associated Press, August 16, 2007 

By Andrew Bridges 

Federal health officials are stopping short of recom- 
mending genetic tests for patients on the blood-thinner 
warfarin, even though they have said such screenings 
could prevent thousands of complications each year. 
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Warfarin, sold under the brand name Coumadin and in 
generic forms, on Thursday became the first widely used 
drug to include genetic testing information on its label. 
The information can help doctors determine how best to 
prescribe the drug. 

“This means personalized medicine is no longer an 
abstract concept but has moved into the mainstream,” the 
Food and Drug Administration’s clinical pharmacology 
chief, Larry Lesko, said in announcing the label change. 

The updated label for warfarin suggests that lower 
doses may be best for patients with variations in two 
specific genes. One produces an enzyme that helps the 
body metabolize warfarin and other medicines; the second 
produces the blood-clotting protein that warfarin blocks. 

The FDA has not changed its dosing recommenda- 
tions for the drug, and tailoring the proper dosage remains 
largely a matter of trial and error. 

A patient's age, weight, diet, and other prescription drug 
use all play a role in determining a proper dose. Patients 
taking too much warfarin can bleed to death. If people take 
too little of the drug, it can fail to protect them from deadly 
blood clots and stroke. 

Genetic testing can reveal which patients may require 
less of the drug and lead doctors to recommend doses 
closer to the lower end of the scale, FDA officials said. 

Rebecca Burkholder, vice president of health policy for 
the National Consumers League, said the FDA's action was 
a good first step. But she said that once patients are on the 
drug, they still must have regular blood tests to see if it is 
working properly. 


It is clear that the age of personalized medicine is 
here. Based on a patient's genetic profile, personalized 
doses of warfarin can be prescribed (see Chapter 13). 
Similarly, several types of cancer treatments have also 
been shown to be more or less effective based on a 
patient’s genetic profile. The emerging field of pharma- 
cogenomics, which deals with the influence of genetic 
variation in an individual’s response to drug therapy 
among other things, is vastly dependent on the emerg- 
ing field of bioinformatics. The use of bioinformatics to 
find genes associated with diseases and treatments is 
strongly promoted by international partnerships such as 
the HapMap Project. This group of scientists is working 
to locate, identify, and disseminate information about 
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the 10 million or so sites of genetic variation in human 
chromosomes (see Chapter 6). The information availa- 
ble through the science of genetic variation combined 
with the power of bioinformatics will have an impact 
on moving personalized medicine more rapidly into 
the doctor's office and clinic. 


LOOKING AHEAD 


Bioinformatics is a new area of science that functions 
at the interface of basic laboratory research and the 
computer. It has become an essential research tool to 
investigate the enormous amount of biological data, 
including sequence information, resulting from genetic 
research. On completing the chapter, you should be 
able to do the following: 


e Describe the need for and importance of archiving 
biological data in computerized databases. 

e Be familiar with different sequence databases, 
including the basic features offered and the types of 
sequence information available. 

e Understand the tools available for analysis of the 
vast amount of available biological data. 

e Successfully carry out a basic BLAST search. 

e Understand connection between gene studies and 
sequence analysis. 


INTRODUCTION 


Over the past few decades, significant developments in 
molecular biology and genomics have generated stag- 
gering amounts of biological information, especially in 
the form of nucleic acid (DNA and RNA) and protein 
sequences. This huge amount of human genome infor- 
mation has helped to propel bioinformatics into a criti- 
cally important interdisciplinary field that develops and 
uses computational methods to store, organize, analyze, 
and manipulate vast amounts of biological data. Those 
involved in the field of bioinformatics make use of biol- 
ogy, computer science, mathematics, genetics, statistics, 
and several other areas of expertise to analyze DNA 
and protein sequence data, to study genomes, to predict 
nucleic acid and protein structure and function, and to 
apply these data to understand the workings of biologi- 
cal organisms. Bioinformatics has made it possible for 
laypeople to have access to information from research- 
ers around the world, and it has brought together a 
global scientific community connected by the need 
to examine, manipulate, and understand the message 
carried in the universal biological language of DNA. 
Although bioinformatics has really expanded since 
the late 1990s, the basics of modern bioinformat- 
ics were put into practice more than 50 years before 
then by Margaret Oakley Dayhoff (1925-1983), who is 
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FIGURE 7.1 
bioinformatics. 


Margaret Oakley Dayhoff, founder of the field of 


widely considered to be the founder of the field of bio- 
informatics (Figure 7.1). Starting with her Ph.D. work in 
1948, she integrated biological science and chemistry 
with the new field of computer science. Dayhoff cre- 
ated the first computerized banks of protein and DNA 
sequences, and she developed many of the tools we 
use today for designing and using computerized data- 
bases. In 1965, Dayhoff published the first Atlas of 
Protein Sequence and Structure, which contained all 65 
protein sequences known at the time. The original Atlas 
and its subsequent volumes were extremely valuable 
both as reference works and as a foundation of mole- 
cular sequences used to study biological questions. 
Many biological questions are asked and answered 
using the tools available in bioinformatics. For example: 


e What is the amino acid sequence of the protein 
encoded by the DNA sequence of this gene? 

e Is this protein sequence similar to other known amino 
acid sequences? Does this sequence show an evolu- 
tionary relationship to any others? 

e What is the biological significance of the sequence 
under study? What is the structure and function of 
the protein? 

e When, where, and under what conditions is the gene 
expressed (copied into RNA)? 

e Does the gene sequence contain a mutation that 
might cause a disease in humans? 


AN EXPLOSION OF DATA FUELED 
THE RISE OF BIOINFORMATICS 


Throughout the 1970s and 1980s, the rapidly increas- 
ing number of protein and nucleic acid sequences 
stored in computer databases led to the development 
of software programs designed to search for sequence 
characteristics such as identity or sequence similarities. 
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FIGURE 7.2 Growth of the international nucleotide sequence data- 
base collaboration. 


The increase in genomic research has dramatically 
increased the number of genomes sequenced since the 
mid-1990s, and caused a large increase in the amount 
of nucleic acid sequence and protein sequence and 
structure data available. By August 2005, these data- 
bases had collected and dispersed 100 billion bases of 
sequence data, representing both individual genes and 
partial and complete genomes of more than 165,000 
organisms (Figure 7.2). Of course many genomes have 
been sequenced since 2005, including studies that 
require comparisons of thousands of genome sequences 
from thousands of individuals (see Chapter 6). 

For this tremendous collection of biological data 
to be useful, scientists need to have easy access to the 
information and to the methods by which the sequence 
data can be manipulated, compared, and analyzed. 
This is where bioinformatics comes into play. Most 
biologists, and also many researchers who are experts 
in their own scientific area, are not trained as compu- 
tational scientists in terms of developing and analyzing 
the computer algorithms or understanding the step-by- 
step procedures needed to search and compare data. 
However, the ability to manipulate nucleic acid and 
protein sequences by computer is now an absolutely 
essential skill for all students of the biological sci- 
ences. Most of us will never become bioinformatics 
experts, but thanks to the development of user-friendly 
sequence analysis computer programs, it is reasonable 
to expect that, by now, both students and researchers 
should understand how biological data are organized, 
how to access the necessary databases and programs, 
and how to analyze the relevant data to explore vari- 
ous biological questions of interest. 

The availability of collective contributions of data 
from many scientists has changed the way we carry out 
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research and has allowed scientists to solve problems 
and understand many aspects of basic biology that we 
would have thought impossible even as recently as the 
1980s. Imagine a researcher studying the Escherichia 
coli bacterial chromosome, which is tiny in size com- 
pared to the genomes of humans and other multicellu- 
lar eukaryotes. The roughly 4 million base pair genome 
in an E. coli bacterium is only about 0.1% as big as 
the human genome, yet even the most diligent and 
hardworking research scientist would not make much 
progress understanding the E. coli DNA sequence 
information without the use of bioinformatics. 


Due to user-friendly online programs that provide free access 
to the computational tools of bioinformatics, both students 
and researchers routinely use these resources to understand 
how biological data are organized in databases and use 
programs to explore the online information. 


SEQUENCE SIMILARITIES SUGGEST 
PROTEIN FUNCTION AND 
EVOLUTIONARY RELATIONSHIPS 


Sometimes scientists know the amino acid sequence 
of a protein, but not the function of the protein in 
the cells. However, comparison of the sequences of 
many known proteins in a database can often reveal 
sequence similarities between the unknown and known 
proteins. The sequences and function information 
about other, well-understood proteins provide clues 
as to the possible function of the unknown protein. 
A good example is the complex that carries oxygen 
in the blood, called hemoglobin. The globin proteins 
that make up hemoglobin have similar amino acid 
sequences, structures, and functions in many species 
besides humans. A newly discovered protein with an 
amino acid sequence similar to that of a globin protein 
is likely to have a similar function to hemoglobin. In 
addition, proteins can have different functions but still 
share some subset of sequences needed for a specific 
aspect of their overall job. For example, a specific sub- 
set of amino acids is found in proteins that bind to gua- 
nosine triphosphate (GTP). This subset of amino acids, 
which confers on the entire protein the ability to bind 
GTP, is found in many proteins that function by binding 
to GTP. Obviously a scientist finding this sequence in a 
newly discovered protein would have a potential clue 
to the function of the unknown protein. It most likely 
binds to GTP to do its job. 

The principle of sequence similarity applies to 
DNA and RNA sequences as well as proteins. The 
sequences of newly found genes are compared to vari- 
ous databases that help to determine the identity and 


TABLE 7.1 Sequence similarities among genomes of model organisms 


TSL 


Organism 


Number of genes 
(estimated) 


Percentage of 
genes similar to 
human genes 


Description of the organism 


Human (Homo sapiens) ~20,500 = 
Mouse (Mus musculus) ~25,000 90% Mice are used to study many human diseases (see A mutation in a single gene found in both mice and 
Chapters 6, 10). humans makes the mouse on the right have 5X as much 
fat as her sister on the left (Figure 7.4). 
Zebra fish (Danio rerio) 16,456 85% Zebra fish are used to study cell and tissue 
development and human diseases (see Chapter 10). 
Fruit fly (Drosophila 13,379 36% For 100 years fruit flies have been a model organism The normal fly eye has 800 regular units (left). Putting 
melanogaster) for the study of genetics. The fruit fly is important to a human gene into the fly causes the retina of the fly 
study animal development, and retinal degeneration eye to rapidly degenerate (right). The closely related fly 
in flies and humans. and human genes are used to study retinal diseases in 
humans (Figure 7.5). 
Thale cress (Arabidopsis ~28,000 26% Arabidopsis is a model flowering plant, but has a 
thaliana) basic cell biology that is similar to that of humans 
and animals (see Chapter 15). 
Yeast (Saccharomyces 5,770 23% Single-celled yeast is a eukaryotic model organism 
cerevisiae) for genetic research that played a key role in 
understanding cell cycle regulation (see Chapter 9). 
Roundworm 19,427 21% The roundworm provides an excellent model system A single gene determines whether a worm becomes 
(Caenorhabditis elegans) to study tissue development and to analyze genes a loner or prefers to eat in company. This research 
involved in aging and neurological diseases. provides clues to the basis of social behavior in 
humans. The green color is from the green fluorescent 
proteins (GFP) expressed by the worms (Figure 7.6). 
Bacteria (Escherichia 4,377 7% E. coli is the king of the model organisms, which 


coli K12) 


Gene estimates and numbers updated 2009 


has revealed a lot about the basic processes of 
DNA replication, transcription, and translation in 
prokaryotes. 
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FIGURE 7.3 Phylogenetic tree of life. Phylogenetic tree shows the separate domains for bacteria, archaea, and eukaryotes. This tree was 
based on RNA sequence comparison studies and was proposed by scientist Carl Woese. The exact relationships among the three domains are 
still controversial, but the existence of three domains is supported by a large amount of data available in biological databases. 


function of the new gene. Even though people look 
very different from mice or roundworms, humans share 
a remarkable number of genes with these and other 
model organisms (Table 7.1). The term “model” refers 
to the fact that model organisms such as the mouse, 
fruit fly, bacterium or human, are the focus of very 
active research and have been very well characterized. 
Although the genes in mouse and man are not iden- 
tical, many of the proteins encoded by these organ- 
isms have similar functions, suggesting that the genes 
and their products are comparable. The similar DNA 
sequences described in this example are considered to 
be homologous, because the two (or more) genes have 
almost identical DNA sequences (same linear order of 
the DNA bases along the strand). 

Sequence similarity is often an indication that the 
genes in question could have originated from a distant 
common ancestor. As genes evolve (through random 
mutations in the DNA), natural selection leads to the 
retention of sequences that are critical to the survival of 
the individual. Genes and the proteins they encode retain 
functionally important sequences through evolution with 
few changes. Many genes with similar sequences are 
found in seemingly distant species, such as the E. coli 
bacterium and a human being. Genes and other DNA 
and RNA sequences are routinely used to deduce pre- 
viously unknown evolutionary relationships among bio- 
logical organisms. In fact, extensive studies comparing 
gene and protein sequences from many organisms ulti- 
mately led to the reclassification of all biological organ- 
isms from the previous two divisions of prokaryotes and 
eukaryotes into three new domains: Archaea, Eubacteria, 


(A) (B) 


FIGURE 7.4 Genes influence obesity. A mutation in a single gene 
found in both mice and humans makes the mouse on the right have 
about five times as much fat as her sister on the left. 


and Eukarya (Figure 7.3). The analysis of ribosomal DNA 
sequences, the genes that encode the ribosomal RNAs 
(rRNAs), has enhanced our understanding of the rela- 
tionship between land animals and aquatic ancestors. 
Mitochondrial DNA sequences such as the cytochrome 
b and the rRNA genes were used to reveal the famil- 
ial relationships among the giant tortoises on Darwin's 
Galapagos Islands. 

The research on genome sequences from many 
organisms has led to important medical and agricultural 
advances. For example, studies conducted on nematodes 
(Caenorhabditis elegans) and fruit flies (Drosophila mela- 
nogaster) provided the scientific community with funda- 
mental information about the structure and function of 
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(A) (B) 


FIGURE 7.5 (A) A normal fly eye has equal amounts of normal (red 
and mutated (white) tissues. (B) Flies with mutations in growth restric- 
tion genes have a larger proportion of white, mutated tissue than nor- 
mal red tissue in their eyes. 


FIGURE 7.6 Transgenic C. elegans roundworm glows with green 
fluorescence. A single gene determines whether a worm becomes a 
loner or prefers to eat in company. This research is designed to pro- 
vide clues to the basis of social behavior in humans. The green color 
indicates the fluorescent proteins expressed by foreign green fluores- 
cent GFP genes in the worms. 


cellular receptors and the process of intracellular signal 
transduction. These cellular receptors often have coun- 
terparts in humans; a specific receptor found in both 
humans and fruit flies is implicated in an inherited form 
of colorectal cancer. Research on the plant Arabidopsis 
thaliana revealed new information about the synthesis of 
vitamin E that will allow researchers to increase the vita- 
min E production in soybean crops. 
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Studies on cell receptor proteins in many organisms helps 
scientists to better understand how cell receptor proteins 
function in humans. A mutation in a specific receptor 
protein found in humans and fruit flies is implicated in 
colorectal cancer, providing a new model system to study 
this cancer. 


BIOLOGICAL DATA ARE ORGANIZED 
IN COMPUTER DATABASES 


A biological database is simply a collection of biologi- 
cal data that is organized in a specific and useful way. 
Bioinformatics databases are very large, accessible by 
computer on the Internet, and must be continuously 
updated with new information, revisions, and correc- 
tions in order to be maximally useful. The computer- 
ized interfaces to the databases are designed to be 
user friendly and they allow researchers to ask for and 
receive information from the database online. In the 
terminology of bioinformatics, a request to the data- 
base is known as a query and the information obtained 
from a query to the database is a result. 

The advent of bioinformatics databases has led to a 
new research approach called database mining, which 
is similar to mining for gold as it involves sifting through 
a tremendous amount of starting material to find com- 
paratively tiny amounts of valuable “nuggets.” In bio- 
informatics, the starting material is the vast amount of 
information in the database(s), and the nuggets are the 
few pieces of data that are of interest to a particular 
researcher. In bioinformatics research, each bit of data is 
a potential nugget; different researchers sifting through 
the same starting material are looking for very different 
nuggets. Most important, however, to make it possible 
to sift through information to find valuable nuggets, the 
data must be present in an orderly database. 

Biological databases are maintained by a number 
of private and government organizations, both in the 
United States and elsewhere in the world. Biological 
databases typically contain collections of nucleic acid 
sequences (DNA or RNA), protein sequences, genome 
sequences, and literature resources, but many data- 
bases also contain information relevant to answering 
specific biological questions. These databases have 
become invaluable tools for researchers worldwide. 


Nucleic Acid Sequence Databases 


Nucleic acid sequences (DNA and RNA) are stored in 
three comprehensive databases on the Internet, which are 
free and easily accessible to the public, allowing scien- 
tists from all over the world to share and compare data. 
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e GenBank (www.ncbi.nlm.nih.gov/Genbank/index. 
html): Maintained at the National Center for 
Biotechnology Information (NCBI) in Maryland, 
United States 

e EMBL (European Molecular Biology Laboratory; 
www.ebi.ac.uk/embl): Maintained at the European 
Bioinformatics Institute in Cambridge, United 
Kingdom 

e DDB) (DNA Database of Japan: www.ddbj.nig. 
ac.jp): Maintained at the National Institute of 
Genetics in Mishima, Japan 


The GenBank, EMBL, and DDBJ databases contain 
the International Nucleotide Sequence Database collab- 
oration, which stays current by sharing updated informa- 
tion daily. Within each database, sequence files, called 
“records,” are available as online web pages. Researchers 
can submit queries about specific sequences in a data- 
base and have quick and easy access to the results. In 
addition to the actual DNA or RNA sequences, the data- 
bases also contain information about gene function, 
encoded proteins, mutations, regulatory DNA sites, ref- 
erences, and links to other web sites where even more 
information can be found. The query results contain any 
information available in the database and linked to the 
sequence being queried. 

GenBank was one of the first sequence databases 
available to researchers, and is a good example of how 
sequence databases work. The result of a query about 
the cystic fibrosis transmembrane conductance regula- 
tor (CFTR) gene from mouse shows part of the GenBank 
record page (Figure 7.7). The mouse CFTR protein is 
very similar in sequence to the human version. The 
CFTR functions in the transport of chloride ions across 
cell membranes. When the human CTFR does not func- 
tion correctly, the consequence is the serious genetic 
disease, cystic fibrosis. The CFTR record demonstrates 
some of the key features of all GenBank entries: 


e Gene name (cystic fibrosis transmembrane con- 
ductance regulator). 

e Accession number or unique identifier for the 
sequence record (NM_021050 and XM_622568). 
This number can be used to access the sequence 
record quickly. 

e Source organism (Mus musculus) and taxonomy. 

e Bibliographical references along with a link to a 
journal article resource for each published sequence 
in the database (this record originally had 10 refer- 
ences). Summary of information known about the 
sequence and its encoded protein. 

e Coding region sequence (CDS), which indicates the 
specific nucleotides corresponding to the amino 
acid sequences in the encoded protein (bases 
138-4568 are coding region) and the amino acid 
sequence of the protein translated from the CDS. 
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e Origin: the original nucleotide sequence submitted 
to the database. 

e To view the current CTFR record online right now, 
go to www.ncbi.nlm.nih.gov. In the drop-down 
menu following “Search,” choose CoreNucleotide, 
and type the accession number NM_021050 into 
the search blank. 


The major databases share and integrate the data 
from different sources, and each database record pro- 
vides links to other databases and other online sources 
with additional information (Figure 7.2). This feature 
is an advantage for researchers because it makes easy 
connections between database entries. For example, 
some records contain information about regions of 
biological significance (e.g., known mutations or func- 
tional domains of encoded proteins) and include addi- 
tional information about the sequence or the protein 
product (i.e., the DNA or protein is likely modified). 
Much of the information at the NCBI site has direct 
links to other relevant NCBI pages, web sites, and 
resources for further information. 


GenBank information is usually accessed through Entrez: 
The Life Sciences Search Engine web page (www.ncbi.nlm. 
nih.gov/gquery/gquery.fcgi.) Entrez is a user-friendly inter- 
face that provides easy access to the GenBank database as 
well as to a variety of other useful biological databases. 


The Entrez search web page shown in Figure 7.8 
provides links to numerous cross-referenced databases 
that are available without cost to the public. Each of the 
databases listed on the web page also links to a small 
popup window with a short description of the web page 
offering. Scientists from many different areas of the life 
sciences use these databases for a variety of reasons. 
The Entrez page also provides access to other DNA, 
RNA, and protein sequence databases. 

The wealth of online information is a real gold 
mine to a scientist who has just found a new protein 
that plays an important role in a disease but who has 
no idea what the protein might do in the cell. To try to 
uncover the possible function of a protein, it is useful 
to find homologs, proteins with known functions that 
are similar but not identical in amino acid sequence to 
the new protein. Using the Entrez search page, a sci- 
entist looking for homologs of a new protein can use 
the HomoloGene link to search among the genes of 
several completely sequenced eukaryotic genomes or 
can access the Conserved Protein Domain Database 
(CDD) link to look for clusters of conserved amino acids 
among known proteins. Sequence similarities between 
proteins (the entire sequence or short regions) often pro- 
vide clues about the function of a new protein or about 
the functions of structural domains in the new protein. 
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NM_021050. Reports Mus musculus cyst...[gi: 116008179] 
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DEFINITION Mus musculus 
homolog (Cftr), MRNA. 


ACCESSION NM_021050 XM_622568 | 


SOURCE Mus musculus (house mouse) 
ORGANISM Mus musculus 


Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; 
Sciurognathi; Muroidea; Muridae; Murinae; Mus. 


REFERENCE 1 (bases 1 to 6305) 


Chan, H.C. 


sperm fertilizing capacity and male fertility 


\ PUBMED 17519339 


Liu,G., Zhu, H., Ma, Z.G., Wang, X.F., Chen, Z.H., Zhou, S.C., 


loystic fibrosis transmembrane conductance regulator Gene name 
Unique gene ID # 
= 
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Source organism 
and taxonomy 
AUTHORS Xu,W.M., Shi, Q.X., Chen, WY., Zhou, C.X., Ni, Y., Rowlands, D.K., Yi 
D H.S., Zh X.H., Ch YW., Yı YX., Yang, W.X. and Refrence 
ong, H.S., Zhang, X.H., Chung, Y.W., Yuan, Y.Y., Yang W.X. an (complete 
TITLE Cystic fibrosis transmembrane conductance regulator is vital to record has 
10 references) 
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 104 (23), 9816-9821 (2007) 


Summary: The membrane-associated protein encoded by this gene is a 
member of the superfamily of ATP -binding cassette (ABC) 
transporters. ABC proteins transport various molecules across 
extra-and intra-cellular membranes. ABC genes are divided into 
seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, 
White). This protein is a member of the MRP subfamily which is 
involved in multi-drug resistance. This gene encodes the cystic 

fibrosis transmembrane regulator and a chloride channel that 

controls the regulation of other transport pathways. Mutations in 

this gene have been associated with autosomal recessive disorders 
such as cystic fibrosis. Alternative splicing of exons 4, 5, and 11 have 
been observed, but full-length transcripts have not yet been fully 


described. J 
CDS 138..4568 || The coding region: bases 138-4568 
/gene="Cftr contain codons 

Ms 


A 


Summary 
of CFTR 
information 


ELSDIYQAPSADSADHLSEKLEREWDREQ 


f /translation="MQKSPLEKASFISKLFFSWTTPILRKGYRHHL 
EVQETRL* 


sequence (complete record 


The translated amino acid 
contains 1477 amino acids) 


ORIGIN 


1 aattggaagc aaatgacatc acctcaggtc tgagtaaaag ggacgagcca aaagcatiga... 
6241 atgtattata tttattactg taatagaata tcatgtgtca ataaaatcc ttttatttgtg 


6301 tgaaa 


Original nucleotide sequence 
submitted to database (complete 
record contains 6305 nucleotides) 


FIGURE7.7 Partial GenBank entry for a mouse cystic fibrosis transmembrane conductance regulator gene. This example shows the key features 
of GenBank entries, including gene names/identifiers, nucleotide sequence, translated amino acid sequence (using single letter abbreviation), 
basic information about the encoded protein, information about the source organism, and links to literature resources. 


In 2007, Entrez added a link to three-dimensional 
macromolecular structures. The Molecular Modeling 
Database (MMDB) contains more than 40,000 
three-dimensional macromolecular structures, includ- 
ing DNA, RNA, and proteins. The MMDB is linked 
to the other NCBI databases, including sequences, 
bibliographical citations, taxonomic classifications, 
and sequence and structure neighbors. This new 
feature at Entrez permits a query search to provide 


everything from simple primary sequence to final 
three-dimensional molecular structures, linked to other 
relevant information all along the way. 

The data manipulations and databases described 
here are fundamental to understanding the secrets 
underlying linear sequences of DNA letters or amino 
acids. Although fairly basic, the examples described 
here demonstrate the power and potential of the tools 
available to scientists through bioinformatics. 
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Entrez, The Life Sciences Search Engine 
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FIGURE 7.8 The Entrez Search Engine home page. The Entrez Search Engine home page illustrates the vast number and variety of databases 


that can be used in biological research. 


In addition to access to sequence data retrieval and 
data analysis tools, Entrez also contains a link to the 
scientific literature database at the National Library 
of Medicine (NLM) called PubMed. This link provides 
access to the NLM biomedical literature citations and 
abstracts, and to PubMed Central, which allows free 
access to many selected full-text journal articles. Entrez 
and PubMed provide the scientific community and the 
world with literally millions of published scientific arti- 
cles and references and a vast array of resources that 
are available without ever leaving the computer. 


Protein Sequence Databases 


Information about DNA sequences is an important 
starting point for research but it is the proteins encoded 
by the genes that determine the structure, function, 
and behavior of an organism. Mutations in genes can 
lead to the production of defective proteins, which can 
lead to disease and disorders. For example, the cystic 
fibrosis transmembrane conductance regulator pro- 
tein (CFTR, Figure 7.7) is normally found in a variety 
of epithelial tissues in the body, including the lungs. 
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Box 7.1 Literature Databases 


Through NCBI or through Entrez a researcher can link to 
several literature databases, type in key words and phrases 
to begin a search, and provide hints for narrowing subse- 
quent pubsearches. PubMed accesses the more than 15 
million citations found on Medline, an indexing service 
for medical research (provided by the National Library of 
Medicine). Each abstract has a “related articles” link that 
allows a researcher to easily expand the search. 


e PubMed Central is an archive of journal literature 
encompassing the life sciences. This database provides 
free access to the full text of more than 150 life science 
journals. 

e Bookshelf contains online biomedical textbooks. 

e Online Mendelian Inheritance in Man (OMIM) is a 
comprehensive and continuously updated listing of 
human genes and genetic disorders. 

e Online Mendelian Inheritance in Animals (OMIA) is a 
dataset of traits, genes, and inherited disorders in ani- 
mals species other than mouse and human. 

e Coffee Break provides brief reports usually based on 
recently published peer-reviewed literature. The work is 
first put in a broad context and then narrows to poten- 
tial applications. Links are provided to demonstrate how 
bioinformatics tools were used in the research process. 


Powered by ATP, the CFTR protein transports chloride 
ions (CI~) across epithelial cell membranes. In the res- 
piratory tract, the passage of CI” is followed by the 
passage of water, which makes the mucus lining the 
airway thin and fluid. When the CFTR gene is mutated, 
the mutant CFTR protein cannot fold into its normal 
three-dimensional shape. Scientists now know that 
protein function almost always depends on the ability 
of the protein to fold into its correct three-dimensional 
structure. In cystic fibrosis patients the mutation in the 
CFTR protein prevents the CI~ ions from being trans- 
ported and the airways fill with thick, sticky mucus. 
This leads to severe respiratory problems, one of the 
hallmarks of this genetic disease (see Chapter 10). 

Dayhoff’s Atlas and more recent research on pro- 
tein evolution led to the development of the Protein 
Sequence Database (PSD), which since 1984 has been 
maintained by the Protein Identification Resource (PIR) 
at Georgetown University (www.pir.georgetown.edu), 
which provides free information and sequence analy- 
sis tools. 

In 2002, PIR and its international partners, the 
European Bioinformatics Institute (EBI; www.ebi.ac.uk) 
and the Swiss Institute of Bioinformatics (SIB; www.isb- 
sib.ch), created UniProt (www.pir.uniprot.org), a single, 
worldwide, comprehensive database dedicated to pro- 
tein sequence and function. UniProt allows researchers 
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to query and analyze sequence information and pro- 
vide links to access additional information about the 
protein of interest. These results include descriptions of 
functions, domain structures, posttranslational modifi- 
cations, mutations, and much more. UniProt also pro- 
vides links to other pages offering abundant external 
resources for the scientific researcher. 

The UniProt record for the Drosophila melanogaster 
(fruit fly) opsin protein indicates that the opsin proteins 
are part of the pigment components in the eye that 
absorb light and are necessary for vision. 

The UniProt opsin record demonstrates some of the 
key features available using the UniProt data files: 


e Protein name (Opsin Rh3) 

e Accession numbers (P04950; Q9Tx53) 

e Source organism (Drosophila melanogaster) and 
taxonomy 

e Amino acid sequence of the protein with associated 
information 

e Bibliographical references along with links to articles 

e Comments about protein function, sub-cellular 
location (if known), and other relevant information 
about the protein 

e Database cross-references and links to additional 
resources and information 

e Special features specific to the opsin protein such 
as identifying the amino acids that span the mem- 
brane, and noting which end of the protein (amino- 
or carboxyl terminus) is extracellular and which 
end resides in the cytoplasm. 


Genome Databases 


The scientific discipline called genomics greatly ben- 
efitted from the technical advances that allowed entire 
genomes to be rapidly sequenced, which contributed to 
the avalanche of data available for biological research. 
Genomics refers primarily to studies on whole sets of 
genes or entire genomes. Genome databases have been 
developed for many model organisms including the bac- 
terium Escherichia coli, budding yeast (Saccharomyces 
cerevisiae), fission yeast (Schizosaccharomyces pombe), 
nematodes (a type of roundworm, Caenorhabditis ele- 
gans), fruit flies (Drosophila melanogaster), the mustard 
plant (Arabidopsis thaliana), mice (Mus musculus), 
zebra fish (Danio rerio), and many other prokaryo- 
tic and eukaryotic organisms (including humans) and 
viruses (Table 7.1). Computer databases are required for 
scientists to manage, organize, and use vast amounts of 
genomic information. By mid-2007, the NCBI database 
contained whole and partial genome sequence data 
for more than 1000 organisms. Database tools allow 
scientists to identify genes and gene families within spe- 
cific genomes, to localize (or “map”) genes to specific 
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FIGURE 7.9 Expression profile of the dystrophin gene. This data set examines the results of skeletal muscle biopsies analyzed to compare 
the expression of the dystrophin gene in biopsies from Duchenne muscular dystrophy (DMD) patients compared with unaffected individuals. 


locations in a genome, and to analyze evolutionary 
relationships between organisms by comparing entire 
genomes (46 chromosomes in humans). 

Researchers formed consortiums dedicated to 
“maintenance and repair” of the genome sequences in 
the databases and to maintain the molecular genetics of 
specific model organisms. The web sites maintained by 
these consortiums, in addition to containing a multitude 
of links to other sequence and literature databases, also 
contain information about members of the scientific 
community and instructions on obtaining lab strains or 
stocks of a particular model organism. The high degree 
of integration among the genomic databases and their 
information has made sharing scientific knowledge and 
materials easy, commonplace, and expected. Many 
journals follow the rule that once published, the rea- 
gents (solutions and other substances) including biolog- 
ical reagents like strains used in the experiment are in 
the public domain and should be made available free 
of cost (except for shipping). 


Researchers send specific DNA, RNA, or protein amino 
acid sequences to online sites that rapidly compare, search, 
or otherwise manipulate the query sequences. Online data- 
bases provide extensive information about gene function 
and mutations, protein structures, DNA control regions, 
literature references, links to other web sites, and more. 


Gene Expression Databases 


The phenotype of an organism—its physical character- 
istics, the way it interacts with the environment, and 
its diseases—depends on which genes are expressed 
throughout the lifetime of the organism. Gene expres- 
sion varies in different cell types and changes in any 
given cell over time, based on its developmental stage 
and the environment. High-throughput microarray 


methods have been developed to permit patterns of 
genes expressed in many cells under different condi- 
tions (see Chapter 13). Expressed sequence tags (ESTs) 
are DNA copies made from the ends of mRNAs. These 
short DNA strands represent a collection of genes 
expressed in a set of cells (see Chapter 6). Full-length 
DNA copies of messenger RNA transcripts are called 
complementary DNAs (cDNAs). ESTs and cDNAs 
are used as molecular tools to identify genes, cod- 
ing sequences, and patterns of gene expression in 
specific cells and tissues. Microarray analysis per- 
mits the simultaneous detection of all of the mRNAs 
transcribed from thousands of genes in a genome at 
any one point in time (see Chapter 13). This allows 
scientists to study how the expression levels of differ- 
ent genes are regulated over time, in different cells 
and tissues, and in response to hormone changes and 
environmental signals. In addition to databases con- 
taining gene and protein sequences, online databases 
also store, organize, access, and analyze RNA and 
protein gene expression data generated from EST and 
microarray tests. 

The Gene Expression Omnibus (GEO) reposi- 
tory at NCBI stores and freely distributes microarray, 
EST, and other forms of high-throughput data that are 
submitted by scientists around the world. In addition 
to its archival function, GEO also provides data min- 
ing tools that allow researchers to query, retrieve, and 
download data relevant to their personal research 
interests. An example of data mined through GEO 
shows the expression of the human dystrophin protein 
in Figure 7.9. The dystrophin protein is important for 
muscle structure and defects in the dystrophin gene 
(and protein) cause muscular dystrophy disease (see 
Chapter 10). 

GEO is the largest fully public online gene expression 
resource with more than 120,000 samples from more 
than 200 organisms (representing 3.2 billion individual 
measurements). 
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Proteome Databases 


Understanding gene expression is a key step toward 
understanding how a cell or an organism works, but 
gene expression does not tell the whole story. The 
mere observation that a gene is transcribed into RNA 
(as indicated by ESTs, microarrays, or another method 
to detect transcription) does not mean that a functional 
protein is produced inside the cell. Often proteins 
do not act alone but form complexes with other pro- 
teins or bind to components such as DNA, RNA, and 
membranes. Scientists are studying interactions 
between proteins and with nucleic acids, studying 
all of the interactions required for a cell to function. 
The new, exciting field of proteomics is the study 
of the proteome. The term “proteome” is analogous to 
the term “genome” but (so far) is less well defined. 
Scientists working on the Human Genome Project 
referred to the proteome as the “proteins expressed by 
a cell or organ at a particular time and under specific 
conditions.” However, the term has also been used to 
refer to the complete set of proteins expressed during 
the complete developmental stages of an organism. It 
is important to read carefully when the word proteome 
is used to understand how the author is using the term. 

The major goals of proteomics are to identify, cata- 
log, and understand the structure and function of a set of 
proteins. The fact that there are far fewer human genes 
than human proteins indicates that the protein comple- 
ment of an organism cannot be fully characterized by 
gene expression analyses alone, making proteomics a 
necessary tool to understand the complexities of living 
cells (see Chapter 6). 

The current protein expression data in compu- 
terized databases is still very small compared to the 
amount of gene expression data. There are probably 
two reasons that protein data has not yet caught up 
with gene expression data. First, working with proteins 
is difficult in general, and the methods used to analyze 
protein expression and protein-protein interactions 
(such as two-dimensional electrophoresis and two- 
hybrid analysis) are technically challenging and time 
consuming. In addition, other methods, such as mass 
spectrometry, are relatively new techniques applied 
to protein expression analysis, with limited data yet 
online. The proteomics databases are maintained by 
the NCBI and the European Bioinformatics Institute, 
as well as specialized databases established by model 
organism consortiums and various research groups. 
The international collaboration coordinated by the 
Human Proteome Organization (HUPO) is cataloging 
all human proteins, their functions, and protein-pro- 
tein interactions, so coming years are likely to see a 
large increase in the amount of protein expression data 
available in online databases. Without the benefit of 
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modern online bioinformatics tools, it took nine years 
to identify the cystic fibrosis transmembrane conduct- 
ance regulator gene (CFTR). Later, with access to com- 
puterized databases and microarray technology, it took 
only nine days to find one of the human genes causing 
Parkinson’s disease. 


USING BIOINFORMATICS DATABASES 


The collection and storage of sequence and other infor- 
mation is extremely important, but it means nothing if 
people cannot find and analyze the data to help answer 
biological questions. Successful use of the various data- 
bases takes practice and the initial experience can be 
a bit overwhelming. But with experience, people can 
develop precise and useful queries and access informa- 
tion that leads to the answers for biological questions. 

The next part of this chapter will not make the reader 
an expert in searching databases and analyzing data, 
but will help everyone to become familiar with the 
basics needed to navigate the computer databases. The 
databases often contain tutorials that help the reader to 
become skilled at finding information. NCBI has a com- 
prehensive list of tools for data mining available on its 
site (www.ncbi.nlm.nih.gov/Tools). 


How to Translate a DNA Sequence into an 
Amino Acid Sequence 


Given that most genes encode proteins, scientists usu- 
ally first determine the DNA sequence and then deduce 
the amino acid sequence of the protein encoded by the 
gene. In the cell the gene coding for a protein prod- 
uct is first transcribed (copied) from DNA into mRNA, 
and then the 3-base RNA codons are translated by the 
ribosome into a specific order of the amino acids in 
the protein. As a result of knowing the genetic code, it 
is possible to scan through all of the possible reading 
frames in a given DNA coding region and determine 
the most likely amino acid sequence(s) predicted by the 
gene sequence. Although for short genes this is possi- 
ble to do by hand, it is time consuming and highly error 
prone. Fortunately, bioinformatics tools in the form of 
computer programs can quickly predict the most likely 
gene product (i.e., the amino acid sequence) using the 
genetic code and the three reading frames possible on 
each DNA strand (Figure 7.10). Two of the three reading 
frames can often be eliminated from consideration by 
the presence of stop codons early in the sequence. 


Searching a Database for Similar Sequences 


The amino acid sequence of a protein does not 
necessarily reveal the identity of the protein or the 
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5’ ATGTCCAC 
GAAACTCTCT 
GACAACTGCA 
CACTCAAAGA 
CTTATTTGAG 
TCTAGACCTT 
TCACCCATTT 
CATCATCAAG 
CATGAGCTTT 
GGTTCCCAAG 


Bioinformatics 


Computer DNA Sequence File 


GCGGTCCTG 
GACTTTGGAC 

ATCAAAATGG 
AGAAGTTGGT 
GAGAATGATG 

CTCGTTTAAA 
GGATAAACGT 
ATCTTGAGGC 
a GAGATAA 
3! 


GAAAACCCAG GCTTGGGCAG 


AGGAAACAAG 
TGCCATATCA 
GCATTGGCCA 
TAAACCTGAC 

GAAAGATGAG 

AGCCTGCCTG 
ATGACATTGG 
GAAGAAAGAC 


CTATATTGAA 
CTGATCTTCT 
AAGTATTGCG 
CCACATTGAA 
TATGAATTTT 
CTCTGACAAA 
TGCCACTGTC 
ACAGTGCCCT 


Run a computer program that 
translates this sequence 

into an amino acid 

seguence in all three 

reading frames. 


(A) Possible amino acid sequences 


5'—> 3’ Frame 1 

Met Ser Thr Arg Ser Trp Lys Thr Gin Ala Trp Ala Gly Asn Ser Leu Thr 
Leu Asp Arg Lys Gin Ala Ile Leu Lys Thr Thr Ala Ile Lys Met Val Pro Tyr 
His STOP Ser Ser His Ser Lys Lys Lys Leu Val His Trp Pro Lys Tyr Cys 
Ala Tyr Leu Arg Arg Met Met STOP Thr STOP Pro Thr Leu Asn Leu Asp 
Leu Leu Val STOP Arg Lys Met Ser Met Asn Phe Ser Pro lle Trp lle Asn 
Val Ala Cys Leu Leu STOP Gin Thr Ser Ser Arg Ser STOP Gly Met Thr 
Leu Val Pro Leu Ser Met Ser Phe His Glu lle Arg Arg Lys Thr Gln Cys 
Pro Gly Ser Gin 


5'—> 3’ Frame 2 

Cys Pro Arg Gly Pro Gly Lys Pro Arg Leu Gly Gin Glu Thr Leu STOP Leu 
Trp Thr Gly Asn Lys Leu Tyr STOP Arg Gin Leu Gin Ser Lys Trp Cys His 
lle Thr Asp Leu Leu Thr Gin Arg Arg Ser Trp Cys lle Gly Gin Ser lle Ala 
Leu lle STOP Gly Glu STOP Cys Lys Pro Asp Pro His STOP lle STOP 
Thr Phe Ser Phe Lys Glu Arg STOP Val STOP lle Phe His Pro Phe Gly 
STOP Thr STOP Pro Ala Cys Ser Asp Lys His His GIn Asp Leu Glu Ala 
STOP His Trp Cys His Cys Pro STOP Ala Phe Thr Arg STOP Glu Glu 
Arg His Ser Ala Leu Val Pro Lys 


5’—> 3' Frame 3 

Val His Ala Val Leu Glu Asn Pro Gly Leu Gly Arg Lys Leu Ser Asp Phe 
Gly Gln Glu Thr Ser Tyr lle Glu Asp Asn Cys Asn Gin Asn Gly Ala lle Ser 
Leu lle Phe Ser Leu Lys Glu Glu Val Gly Ala Leu Ala Lys Val Leu Arg Leu 
Phe Glu Glu Asn Asp Val Asn Leu Thr His lle Glu Ser Arg Pro Ser Arg 
Leu Lys Lys Asp Glu Tyr Glu Phe Phe Thr His Leu Asp Lys Arg Ser Leu 
Pro Ala Leu Thr Asn lle lle Lys lle Leu Arg His Asp lle Gly Ala Thr Val His 
Glu Leu Ser Arg Asp Lys Lys Lys Asp Thr Val Pro Trp Phe Pro 


(B) 


FIGURE 7.10 Translating a gene DNA sequence into a protein 
amino acid sequence. Computer programs are designed to use the 
genetic code to translate a gene sequence to an amino acid sequence. 
Although this can be done by hand, it is extremely time consuming. 
(A) Coding strand of a DNA sequence. (B) Three possible transla- 
tion reading frames. The top sequence is the correct reading frame 
because it lacks translation stop codons. Met is translation start; STOP 
indicates translation termination. 


actual function of the protein in the cell. Scientists use 
several clues to figure out the role of a protein in a 
cell. For example, sequence comparison will reveal if a 
gene sequence (or the translated amino acid sequence) 
is similar to a known gene or protein sequence. If the 
sequences are similar, subsequent experiments are 
based on the idea that the two proteins might each 
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(A) 
Sequence 1: N-G-C-A-N-N-C-T-T-A-G-C-N-T-A-A-G-C-G-C 
Sequence 2: N-G-C-A-N-N-G-A-T-A-A-C-N-T-A-A-G-C-G-C 


(B) 
Sequence 1:NGCANNCTTAGCNTAAGCGC 
Sequence 3: NGC ---GATAGCNTAAGCGC 


(C) 

Sequence 1:NGCANNCTTAGCNTAAGCGC 
Sequence 2:NGCANNGATAACNTAAGCGC 
Sequence 3: NGC - - - GATAGCNTAAGCGC 


FIGURE 7.11 Sequence alignments. (A) Generalized pair-wise align- 
ment of two closely related DNA sequences. (B) Sequence 1 in a pair- 
wise “gapped” alignment with another sequence. The “gaps” occur 
as a result of varying mutations in the different sequences through- 
out evolution. (C) Multiple sequence alignment: N = any nucleotide; 
shaded regions show differences between sequences. Adapted from 
Brooker et al., Biology, McGraw Hill, New York, 2007. 


perform the same or similar functions. The regions of 
similar amino acids might indicate a possible function 
for the new protein or might signal an evolutionary 
relationship between the genes. 

The most straightforward way to determine the 
degree or percent of similarity between two or more 
nucleic acid or protein sequences is to perform a com- 
puter comparison by sequence alignment. Comparing 
one sequence directly with one other sequence is 
called a pair-wise sequence alignment; comparing one 
sequence in parallel with several other sequences is a 
multiple sequence alignment (Figure 7.11). 

As with many analyses now carried out by comput- 
ers, sequence alignments were originally performed by 
hand, by visually searching for stretches of two amino 
acid or DNA sequences that were similar to each 
other. Doing the alignments is challenging because 
changes such as base pair substitutions or deletions 
occur in gene sequences throughout evolutionary time. 
A base pair substitution results in a difference between 
sequences, and a deletion of base pairs results in a gap 
in one sequence relative to another in the alignment. In 
other words, while the genes may still be significantly 
related to each other, the alignment of the sequences 
becomes difficult without the help of a computer. 
Modern database tools accommodate sequence incon- 
sistencies and gaps, and some can be used to readily 
perform multiple sequence alignments. 

The Basic Local Alignment Search Tool (BLAST) 
is one of the principal online tools used for sequence 
alignments. BLAST can be accessed through NCBI 
and GenBank home pages. It contains several pro- 
tein, nucleotide, and genome sequence databases 
and is used to find regions of similarity when compar- 
ing sequences in nucleic acid and protein databases. 
Using BLAST is relatively simple: a researcher submits 
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a sequence query (the sequence of interest, which can 
be copied and pasted into the web page) and BLAST 
compares that query sequence against the sequences 
in selected databases, and finds matching sequences 
among the millions of sequences available in the data- 
base. Imagine trying to do this without a computer, 
and you can see why many scientists consider BLAST 
to be the most important tool in the bioinformatics 
workshop. The BLAST search home page allows a sci- 
entist to select from several different programs, depend- 
ing on the sequence being entered and the needs of the 
researcher. The researcher can choose to run nucleotide 
comparisons, protein comparisons, or can have the 
program translate the nucleotide sequence into its pre- 
dicted protein product and compare the predicted pro- 
tein to the sequence databases of known proteins. After 
the program searches and finds similar sequences in dif- 
ferent databases and organisms, it calculates the statisti- 
cal significance of each match. This information assists 
the researcher in deciding the relative biological impor- 
tance of the different sequence similarities found in the 
searches. 

These results might help the scientist to decide 
the direction that the research experiments take in 
the future. For example, if the two proteins contain a 
region of similar amino acids, and this region of the 
known protein binds to actin, it is logical to infer that 
the unknown protein also binds to actin. This clue to 
the possible role of the protein inside the cell can now 
be tested experimentally. Researchers could follow up 
using actin-binding assays to determine if the unknown 
protein can in fact bind to actin. If it does, then the 
researchers could produce a mutant protein containing 
a mutation that alters the region of amino acid similar- 
ity in the unknown protein. 


Using BLAST to Compare Nucleotide 
Sequences 


The BLAST programs are useful for basic research and 
will be discussed in detail. NCBI offers helpful tutorials 
for review of the basics and for moving beyond (www. 
ncbi.nlm.nih.gov/Education/BLASTinfo/tut1 .html). 

In this example, a researcher has the following 
partial gene sequence she or he wishes to begin to 
characterize. The sequence is entered using a com- 
puter file (or by hand) as a continuous string of lower 
case letters, which are separated into blocks of 10 
bases shown here but are not separated when submit- 
ted as a query. 


attgtgattg gcgctataat agtcgtctcg gttttacaac cctacatctt cctag- 
caacg gtgccagtgc tagtgacctt tattttactg agggcctact tccttcacac 
attacagcag ctcaaacaac tggaatctga aggcaggagt ccaattttca cccac- 
cttgt tacaagctta 
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The sequence is submitted online as a query to 
BLAST, using the initial nucleotide blast program to 
compare the nucleotide query against nucleotide data- 
bases (with parameters set to search through all nucle- 
otide databases and to find all “somewhat similar” 
sequences (Figure 7.12). As a researcher becomes expe- 
rienced with database searches and data mining, search 
parameters can be refined, perhaps limiting the search 
to sequences within specific organisms or choosing to 
search only certain databases. 

The results of the BLAST search are presented in 
three ways: 


e A graphical overview 

e Alisting 

e Pair wise sequence alignments of the top (best) 
sequence matches 


Each of these representations of the data is associ- 
ated with links to additional online information about 
the sequences that match the query. The results also 
provide a Score (S) that is assigned to each match (the 
higher the score, the better the match) and an Expect 
value (E) that measures the number of hits (matching 
sequences) that can be expected by random chance 
with a particular query (the lower the E-value, the 
more likely that the sequences are related to each 
other). For example, an E-value of 10 indicates that the 
search could have yielded 10 matches just by chance. 
Thus, the lower the E-value, the better the chance that 
the query sequence is related to the sequence found by 
the search. The E-values shown in Figure 7.13 are very 
low numbers indeed; the best match has an E-value of 
4e-71 (or 4 X 1077'). This very small value means it 
is likely that the query sequence, the unknown gene 
the researcher wants to characterize, encodes a cystic 
fibrosis transmembrane conductance regulator (CFTR) 
protein (see Chapter 10). The high level of sequence 
similarity indicated by the E-value can also be visual- 
ized by looking at the pair-wise sequence alignment 
produced by comparing the query sequence with the 
Mus musculus CFTR sequence (Figure 7.13). 


Using BLAST to Compare Nucleotide 
Sequences with Protein Sequences 


It is not unusual to use a nucleic acid sequence as a 
query and obtain results containing only unacceptably 
high E-values or a message stating that no significant 
similarity was found. This does not mean that the avail- 
able electronic databases do not contain the desired 
information. Because the genetic code is redundant, 
nucleic acid sequences can differ over time, a process 
called divergence; but because the code is redundant, 
the protein sequences encoded by two divergent genes 
can remain relatively unchanged. Also, if the partial 
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(A) 


(B) 


BLAST finds regions of similarity between biological sequences. more... 
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Choose a species genome to search, o list all genomic BLAST databases 
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Basic BLAST 

(Choose a BLAST program to run. 
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Search protein database using a protein query 
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blastx Search protein database using a translated nucleotide query 
thlestn Search translated nucleotide catabase usng a protein query 
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Specialized BLAST 
Choose a type of specialized search (or database name in parertheses ) 
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C Somewhat sienitar sequences (blastn) 
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_ Show rosuits in s nea wincow 


P Algorithm parameters 


Search database Human G+T using Megablast (Optimize for highly similar sequences) 
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FIGURE 7.12 BLAST home 
page online. (A) From this site a 
researcher can carry out BLAST 
searches. (B) The researcher 
enters a sequence of interest into 
the query box and the computer 
will compare it to selected nucle- 
otide and protein databases. A 
variety of other databases are 
available for the advanced user 
in addition to BLAST searches. 


164 


(A) 


(B) 


Distribution of top 10 hits using sequence query: 


Color key for alignment scores 
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Identities of the top 10 hits: 
Accession 


Numbers: Description: 


14_021050.2 


Mus musculus vystic fibrosis ir 
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469298. Mouse cystic 
(CFTR) MRNA, comp) 
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M60493.1 
(CFTR) @RNA, complete cds 


34_001059206.1 PREDICTED: Rattus norvegicus cystic fibrosis transmembrane 


conductance regulator homolog (Cftr). mRNA 


anstemprene conductance 
{iprosis transmembrane conductance regulator 


Mouse cystic fibrosis transmembrane conductance regulator 
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Links to 
E other 
value: resources: 
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¥4_001062374.1 PREDICTED: Rattus norvegicus cystic fibrosis transmembrane 3e-60 690 [URS 


conductance regulator homolog (Cftr), ARNA 
w 031506.1 


Alignments of top 2 hits: 


>ref |NN_021050.21 Mus musculus cystic fibrosis transmembrane conductance regulator 
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Query 61 
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GTIGCCAGTGCTAGT 
dettato DODONA 
GTGCCAGGGCTAGT: 


ATTGTGATTGGCGCTATAATAGTCGTCTCGGTTTTACAACCCTACATCTTOCTAGCAAC 
PRCCUEEEEEL HALCLACAOHOOLALOOOA COHACHOACOAHAOOOUAOCOAOANOONN 
\TTGTGATTGGAGCTA TAATAGTCGTCTCGGCATTACAACCCTACATCTICCTAGCAACG 
GACCTTTATTTTACTGAGGGCCTACTTCCTICACACATTACAGCAG 

AAAAIHAAOOOOOCALOAHOACCOACAAOAO LAO CEE 
'AGTCTTTATTTTACTGAGGGCCTACTTOCTTCATACAGCACAGCAG 


CTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTTCACCCACCTTGTTACAAGCTT: 
PECUPCEEUGEEE EEE CECEU CEE EAE ETEEEEUTEEE ET EE EOE PEE 
CTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTICACCCACCTIGTGACAAGCTTA 
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ATAATAGTCGTCTCGGCATTACAACCCTACATCTICCTAGCAACG 


\TTTTACTGAGGGCCTACTTCCTICACACATTACAGCAG 120 
AALLAACUAHAOOCCLAOOOHCOOAOOOO CAU Pee 
TTTTACTGAGGGCCTACTTCCTTCATACAGCACAGCAG 


CTCAAACAACTGOGAATCTGAAGGCAGGAGTCCAATTTTCACCCACCTTGTTACAAGCTTA 
PEPEUUECEREEEEECCULEGEERDEEE EER EPTEE REET EERE CUE GREER 
CTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTTCACCCACCTIGTGACAAGCTTA 


FIGURE 7.13 BLAST results from comparing nucleotide sequences. (A) A graphical overview of the top 10 matches from the database. The 
matches are aligned to the query sequence and the score of the alignment is indicated by color. A researcher can “mouse over” a bar, reveal- 
ing the definition of the gene and the exact score. (B) A list of the top 10 matches generated from the query. (C) A pair-wise sequence align- 


ment with the top matches. 


gene sequence used as a query is located in an intron or 
a noncoding region of a gene, the sequence has proba- 
bly diverged to a much greater extent than sequences in 
the exons or coding regions of the gene (see Chapter 6). 
For these and other reasons, a researcher who is not 
successful using a simple nucleotide-nucleotide search 
may get more favorable results by querying a nucleotide 
sequence against a protein database. Specific BLAST 
searches (blastx) accomplish this by translating the 
nucleotide query sequences into amino acid sequences 
and comparing them with protein sequence databases. 
The following sequence query was generated as part of 


an undergraduate classroom project, in which cDNAs 
were copied from the mRNAs expressed in red clover 
(Trifolium pratense) and were partially sequenced. 


aagtgtataa aggtcaaatt attggcatcc atcaacgccc tggggacttg 
gccttgaatg tttgcaagaa aaaagctgca acaaacattc gttccaacaa 
ggaacaatca gtgattcttg atacaccatt ggattacagt ctggatgact gcatt- 
gagta catccaagaa gatgaactag tagagatcac cccccaaagt 


When a search was carried out using “nucleotide 
blast” (RNA or DNA query used to search the nucleotide 
database), no significant matches were found (all E-values 
were greater than 1). The same DNA sequence, when 
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gi|68057657|gb|AAX87910.1| GTP-binding protein TypA/ BipA [Haemophilus influenzae 86-028NP] 
gi|68249458|ref[YP_248570.1| GTP-binding protein TypA/ BipA [Haemophilus influenzae 86-028NP] 


Length=616 


Score = 75.5 bits (184), Expect = 6e-13 
Identities= 35/67 (52%), Positives = 50/67 (74%), Gaps = 1/67 (1%) 
Frame = +3 


Query3 VYKGQIIGIHQRPG 
VY+GQIIGIH-R- 

Subjct 524VYEGQIIGIHSRSN 

Query 180 VE I TP QS 200 


VE+TP+S 
Subjct 584 VEV TPES 590 


RPGDLALNVCKKKAATN | R-SNKEQSVILDTPLDYSLDDCIEYIQEDEL 179 
R--DL-+N--+K---TN+#R-S-K+-+#++L-TP+-+SL+--lE+l-+DEL 
RSNDLTVNCLQKKL-TNMRASGKDDAIVLTTPVKFSLEQAIEFIDDDEL 583 


FIGURE 7.14 DNA Sequence Alignment (blastx search). This sequence alignment was generated by blastx, which uses a translated nucle- 
otide query to search protein databases. In this example the query is a nucleotide sequence from an unknown red clover cDNA sequence. 
The cDNA is made by copying mRNA into DNA (see Chapter 5). In the alignment shown, the amino acid sequence is located between the 
query sequence and the subject line (Sbjct). The returned subject indicates the position of the exact amino acid matches between the two 
sequences (single-letter amino acid designations are shown). Conservative amino acid changes, in which the R groups of amino acids are 
chemically similar, are represented by a (+). When the R groups of amino acids are very dissimilar, they are represented by a dash. Similar to 
a nucleotide BLAST search, graphical alignments and a listing of significant matches are returned but are not shown in this figure. 


submitted to blastx (the DNA query was translated into 
an amino acid sequence and used to search the protein 
database), resulted in matches that showed significant 
similarities to the sequence of a putative bacterial GTP- 
binding protein TypA. Since the sequencing and BLAST 
search was initially performed in 2003, many more GTP- 
binding proteins from a variety of organisms have been 
added to the international sequence databases. Most 
sequences do not show significant nucleotide sequence 
similarity to the sequence derived from the clover cDNAs 
shown above, but using blastx, more than 150 of the 
GTP-binding protein entries are returned with E-values of 
at least 2e-10. Blastx searches, similar to nucleotide blast 
searches, return a graphical overview of sequence align- 
ments, a list of matches, and pair-wise sequence align- 
ments of matches. A pair-wise alignment of the translated 
red clover sequence with a bacterial GTP-binding protein 
is shown in Figure 7.14. 


Researchers often use the computer to compare two or more 
protein sequences or parts of protein sequences. BLAST pro- 
grams can also predict the amino acid sequence of a protein 
from the DNA sequence of the gene. 


Using BLAST to Compare Protein Sequences 
with Other Protein Sequences 


Here we have focused on using BLAST to demonstrate 
sequence similarity searches, but BLAST is not the only 
online program available for this purpose. Another 
example is FASTA (FAST-AIl), which compares protein 
or nucleotide sequences. BLAST and FASTA use different 
computer algorithms to carry out sequence alignments 


and calculate the E-values in different ways. BLAST is 
generally faster than FASTA, but FASTA may be a better 
choice if the researcher is trying to compare very differ- 
ent sequences. 

Pair-wise sequence alignments are very useful, but 
multiple sequence alignment (MSA) is also an important 
tool in studying proteins. MSA can provide information 
about amino acid sequences that are conserved among 
many different proteins, sometimes called sequence 
motifs. The amino acid sequences that encode protein 
regions important for the structure, function, or regula- 
tion of the protein are sometimes conserved relatively 
unchanged during evolution, and as a result these 
motifs are often present in all proteins with similar 
functions. Information regarding conserved DNA and 
protein sequences is very useful in experimental design, 
because it allows researchers to target specific regions 
of a protein for further investigation, as described ear- 
lier for the actin-binding site motif. 

Several online programs are available for making 
multiple sequence alignments, including the popular 
ClustalW (www.ebi.ac.uk/clustalw/#, also accessed 
through PIR). As with BLAST, ClustalW is easy to use 
and tutorial help is available on the site. 


APPLIED BIOINFORMATICS 


Phylogenetics: Discovering Evolutionary 
Relationships 


The ability to search for identity and similarity among 
sequences is an important skill for scientists engaged in 
many fields including phylogenetics, the study of evo- 
lutionary relationships among and between organisms. 
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Box 7.2 Activity: Perform BLAST Searches 


The following sequences belong to a gene that causes a 
human disease when mutated. Your goal in this activity is 
to perform a simple nucleotide BLAST search to identify the 
gene, explore various features of the gene, and determine 
which genetic disease is caused by a mutation in the gene. 


1. First carry out your BLAST search: 

e Visit the NCBI site (http://www.ncbi.nlm.nih.gov/). 

e Click on the box labeled “BLAST.” 

e On the BLAST web page, choose “nucleotide blast.” 

e In the box under “Enter Query Sequence,” type in one 
of the partial sequences shown below (or copy and 
paste sequence in as a query). 

e Click the BLAST button at the bottom of the page. 


A few tips: 


e The default settings permit a search of the human 
genome/transcript databases for highly similar matches. 
Leave the default settings in place for this search. 

e Searches can take from only a few minutes to much 
longer, depending on the number of users on the site. 
If the search is taking more than a few minutes, it is 
advisable to save the Request ID (RID) associated with 
your search. This number is near the top of the page 
and can be used to access the results of the search at a 
later time. 


Partial query sequences: 


1. GIGTGTGATGAGCGGACGTCCCTAATGTCGGCCGA 
GAGCCCCACGCCGCGCTCCTG 

2. CTTCTAATGGTGATTATGGGAGAACT GGAGCCTTCA 
GAGGGTAAAATTAAGCACAGT GGAAGAATTTCATTC 
TGTTCTCAGTTTTCCTGG 

3. AAGTTACTGGTGGAAGAGTTGCCCCT GCGCCAGGG 
AATTCTCAAACAAT TAAAT GAAACT GGAGGACCCG 

4. CAGGGAAACGGCATACACTGGAGAAGAATGTGTTG 
GTTGTCTCTGTAGTCACACCTGGATGTAACCAGCT 

5. CGCCCTGGGATTTACCGTGCTTTTAGCGTCCTACAC 
GAGCCATGGGGCGGACGCCAATTTGGAGGC 

6. CCATGGATTCTGAATGTGCTTAATTTAAAAGCCTTTG 
ATTTTTACAAAGT GATCGAAAGT 


Biologists use comparative sequence data (information 
revealed from sequence alignments) to construct phy- 
logenetic trees (see Figure 7.3). These are diagrams 
showing the evolutionary relationships among various 
species believed to have a single common evolution- 
ary ancestor. The common ancestor species of rodents, 
for instance, eventually gave rise to species of mice and 
rats. After the ancestral species diverged into two sepa- 
rate species, each branch gives rise to an independent 
lineage of organisms that accumulate, by chance, dif- 
ferent mutations, usually small base pair changes in the 
genome DNA. A comparison of a short sequence from 
the beta-globin gene from modern-day mice and rats 


2. When the BLAST search is complete, the results will appear 
on the computer screen. You will see a table that lists 
“Sequences producing significant alignments.” Click on 
a link under “transcripts” to access your report (sequence 
similarity data) and answer the following questions: 

e What is the identity of your gene? 

e What is the accession number? 

e What is the E-value associated with your match? What 
does this E-value tell you? 

e Look in the summary section. What information is 
given about this gene? If stated, what normal function 
of the protein encoded by the gene? What disease is 
caused when this gene is mutated? 

e On which chromosome is the gene located? (Click on 
“Genome View” in the results page to get positions of 
the BLAST hits in the human genome.) 


3. Repeat this activity with two more of the sequences listed 
above. 


4. Choose one of the three sequences used for the basic 
search and do another BLAST search, this time chang- 
ing the search parameters. Instead of only searching the 
human databases, choose “Other,” which will allow a 
search of all nucleotide databases. 

e Is this gene found in any other organism? Which one(s)? 

e What is (are) the E-values in the other organism(s)? 
What does this tell you? 

e If this gene is found in organisms besides humans, click 
on one or more accession number(s) of the nonhuman 
gene. Is there any indication that this gene is impli- 
cated in disease in other organisms? 


5. Repeat the BLAST search, and in addition to choos- 
ing “Other” databases, also choose “somewhat similar 
sequences” under Programs. 

e How does the number of matches compare to your first 
search? 

e In what other organisms is the gene found? 

e What are the E-values in the new organisms? 

e What does the E-value tell you about the evolutionary 
relatedness of the organisms to humans? 


shows that the sequences are similar but not identical 
(Figure 7.15). The beta-globin genes in mice and rats 
are homologous, because they were derived from the 
same ancestral gene and have similar functions in both 
mice and rats (see Chapter 10). 

The redundancy of the genetic code means that a 
change in a nucleotide might not lead to a change in 
an amino acid and a mutation that leads to a change 
in an amino acid does not always change the function 
of a protein (see Chapter 10). Evolutionary biologists 
often look at slowly evolving proteins as a “molecu- 
lar fossil record” that can be used to study the rela- 
tionships between organisms. These biologists want 
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Mouse 


GGG CAGGTTGGTATCCAGGT TACA AGGCA GCTCACAAGT AGAAGCT GGGT GCT TGGAGAC 


UR RURRRRRROURRRURRROURRURRRURE CON UDDUDUOU MUM RERDDUDDUDED| 
GGG CAGGTTGGTATCCAGGT TACAAGGTAGCTCCTAAG TAGAAGTTTGGTGC TTGGAGAC 


Rat 


(A) A comparison of one DNA strand in the mouse and rat B-globin 
genes 


Random 


Accumulation Qs 
Evolutionary divergence 


of random 
mutations produced many species 
over many of rodents. 


generations 


B-globin gene in common 
ancestor of mice and rats 


(B) The formation of homologous f-globin genes during evolution 
of mice and rats 


FIGURE 7.15 Comparison of short segments of the 6-globin gene in 
mice and rats. (A) Identical bases are connected by a vertical line; differ- 
ences in bases are shown in red. The sequences are similar but not iden- 
tical because mice and rats have accumulated different mutations after 
their lineage diverged from a common rodent ancestor. (B) Simplified 
representation of the formation of divergent forms of 3-globin. Adapted 
from Brooker et al., Biology, McGraw Hill, New York, 2007. 


to know the amount of sequence divergence between 
two proteins, because the more similar the two protein 
sequences are to each other, the more closely related 
to each other the two organisms are likely to be. More 
distantly related organisms have the most differences 
in protein sequences. The similarities between gene or 
protein sequences often provide important clues about 
the function of an unknown protein. Proteins with sig- 
nificant sequence similarity are conserved and are pre- 
dicted to be members of the same protein (or gene) 
family. The combined powers of molecular biology, 
genetics, and bioinformatics have allowed researchers 
to establish functional relationships among proteins and 
to propose how the proteins and their genes evolved. 


Bioinformatics in Modeling 
Protein Structure 


The function of a protein depends on the three- 
dimensional structure adopted by folding the linear 
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chain of amino acids after translation in the environ- 
ment of the cell (see Chapter 10). Traditionally the 
shapes of protein structures have been determined by 
x-ray crystallography or nuclear magnetic resonance 
spectroscopy (NMR). These techniques yield the actual 
physical positions in three-dimensional space of each 
atom of a protein molecule, but these methods are 
time consuming and require expensive specialized 
equipment and personnel. However, researchers using 
these techniques have substantially increased the num- 
bers of three-dimensional protein structures available 
in the world’s public databases. Bioinformatics pro- 
vides tools that simplify the process of visualizing the 
potential three-dimensional protein structures by using 
databases of x-ray crystallography and NMR informa- 
tion to perform online protein modeling that permits 
scientists to predict the shapes of proteins. 

Using the database information, protein mode- 
ling software can sometimes predict the structures of 
uncharacterized or unidentified proteins. This is pos- 
sible because the structure of a protein depends at 
least in part on the sequence of the amino acids in the 
protein, and the sequence of amino acids depends on 
the gene sequence. When the amino acid sequence is 
known, it is possible to use protein modeling programs 
that compare the unknown sequence with the identi- 
fied protein structures in available databases. Proteins 
with similar amino acid sequences are likely to fold 
into similar three-dimensional structures and probably 
have similar functions. 

Even if two proteins do not have identical sequences 
overall, it is still possible for the proteins to share short 
stretches of similar (or even identical) amino acids. These 
similar amino acid sequence motifs might fold in a simi- 
lar manner in two different proteins, suggesting that the 
similar protein motifs have a common function, even 
though they have different amino acid sequences. For 
example, the CFTR protein has two regions that serve 
as (ATP)-binding domains (Figure 7.16). Other proteins 
containing similar amino acid motifs might also bind to 
ATP as part of their function. For example, the sequence 
of a motif in the multidrug resistance associated protein 
(which confers resistance to anticancer drugs) is simi- 
lar to known ATP-binding domains (Figure 7.16). Thus, 
it is possible that the multidrug resistance associated 
protein also binds to ATP, a fact that could possibly be 
exploited in an anti-cancer treatment strategy; blocking 
ATP binding might inhibit growth of the cancer cells. 
Many pattern recognition and protein modeling pro- 
grams are available through UniProt and NCBI, which 
allow researchers to perform extensive online analyses 
of protein domains, protein families, and potential three- 
dimensional structures. As the genome projects across 
the world produce more and more DNA sequences from 
different organisms, as more protein families are being 
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NBF1 mutations 
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T T 
membrane NBF1 Ser-Thr-Gly-Ser-Gly-Lys-Thr-Ser 
NBF2 Arg-Thr-Gly-Ser-Gly-Lys-Ser-Thr 
MDR Arg-Thr-Gly-Ala-Gly-Lys-Ser-Ser 
NBF1 R NBF2 T = transmembrane domain 


(A) 


R = chloride channel 
NBF = ATP-binding domain 


(B) 


FIGURE 7.16 CFTR gene mutations cause cystic fibrosis disease. (A) CFTR protein showing the ATP-binding domains (NFB1 and NFB2). 
(B) Comparison of the amino acid sequences of the NFB1 and NFB2 domains show they have similar sequences. The similarity in sequence 
with part of the human multidrug resistance (MDR) protein suggests that MDR might also contain an ATP-binding site. 


identified by the presence of common sequence motifs. 
As novel protein domains and protein folding patterns 
are discovered, the task of online protein modeling will 
become an increasingly important skill for biologists 
studying protein function. 


Analysis of Gene Expression 


As discussed earlier, genome sequencing projects have 
generated gigabases of sequence data available in 
a variety of databases. However, scientists agree that 
knowing the genome DNA sequence of an organism 
is really just the beginning of the study of the structure 
and function of the organism. Take the case of the little 
mustard plant, Arabidopsis thaliana, considered to be 
a weed by many. The A. thaliana DNA genome is rela- 
tively small (~114.5 megabases) and contains about 
25,000 genes. Somehow scientists need to identify the 
locations of genes in a genome without necessarily 
knowing the entire genome sequence. Many years of 
research on genes in bacteria, humans, and now plants 
have revealed some almost universal DNA and RNA 
sequences that indicate landmarks in the genome DNA 
such as the start and end points of a gene and the loca- 
tions of introns and exons (see Chapter 6). The field of 
bioinformatics has translated these molecular signals 
in the sequence into computer tools that help scien- 
tists to answer important questions about sequence 
data. As a result of basic research on many genes and 
through analysis of many gene sequences from many 
organisms, computer programs have been developed 
that recognize the specific DNA sequences of poten- 
tial promoters, transcription start sites, transcription 
termination signals, coding regions, and so on, to help 
researchers identify the locations of putative genes 
in the genome. Of course, the presence of a possible 
gene sequence does not automatically mean that the 


sequence is an actual gene, and sequence alone is not 
evidence that a gene is actually expressed as mRNAs 
in the cells being studied. As discussed earlier in the 
chapter, EST and microarray data can be used to iden- 
tify genes that are expressed in certain cells, with an 
ever-increasing amount of gene expression data avail- 
able in computerized databases. 

The model organism, C. elegans, is widely used 
to study programmed cell death, or apoptosis (see 
Chapter 9). Apoptosis is an important process in 
humans (and other organisms) that is required for nor- 
mal embryo and tissue development and physiology. 
For example, apoptosis is used to degrade cells in the 
tail on developing human embryos, to remove the tis- 
sue growing between developing human fingers, and 
to destroy selected nerves to regulate the number of 
neurons in a developing nervous system. Several genes 
important in human cell apoptosis were elucidated 
from studying C. elegans mutants. One of the genes, 
ced-1, encodes a protein that is involved in engulfing 
the debris released by cells that die due to apoptosis. 
Because ced-1 is important in cell death in C. elegans, 
it is quite possible that a gene similar to ced-1 per- 
forms a similar function in other organisms. A BLAST 
search reveals that gene sequences similar to ced-1 
are present in other genomes and that the ced-1-like 
gene is expressed. Using “ced-1” as a query to the EST 
databases through NCBI yields almost 90 hits, which 
indicate that genes similar to ced-1 are expressed in 
specific tissues of several organisms, including human 
embryonic brain, rat (Rattus norvegicus) adult brain, 
kidney, placenta, aorta, and heart. This information is 
central to a better understanding of the ced-1 gene and 
the process of apoptosis in other organisms and will be 
extremely valuable in designing experiments to answer 
further questions about the function of the ced-1 
protein. 
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Box 7.3 Huntington’s Disease: The Power of Animal Models and Gene Expression Data 


Huntington's disease (HD) is an incurable and fatal hereditary 
neurodegenerative disorder caused by a mutation in the gene 
that encodes the huntingtin protein. The altered huntingtin 
protein destroys specific brain motor neurons, leading to its 
lethal decline in motor and cognitive abilities. Recent studies 
have indicated that the mutant huntingtin protein alters the 
transcriptional activity of other genes in the affected neurons. 
As gene expression changes in the nerves of HD patients, 
monitoring the different patterns in gene expression over time 
provides a new avenue for research to understand the mecha- 
nisms of the disease and to devise possible new treatments. 

In one study, scientists measured the gene expression pat- 
terns in seven different transgenic mice used as models of HD. 
The transgenic mice strains were engineered to exhibit the 
symptoms associated with the different stages of HD disease 
(Figure 7.17). Analyses of the transgenic mice indicated that 
different forms and different amounts of the huntingtin pro- 
tein had varying effects on the gene expression profiles. After 
completing the study in mice, the researchers then developed 
computational methods to compare the gene expression pat- 
terns in mice with gene expression in the human patients. 
The idea is to use this information to test possible drugs that 
affect gene transcription in mice with eventual use in human 
HD patients. This type of study underscores the importance of 
bioinformatics to store, analyze, and retrieve gene expression 
data as well as the fact that mouse models are a valuable tool 
to study human disease and its potential treatment. 


Analysis of Gene Mutations 


Although it is important to know the sequence of a 
gene and the structure of the encoded protein, it is 
often also important to understand what happens when 
a particular gene (and possibly the encoded protein) 
is altered by a mutation. Although some mutations 
cause no change in a protein product, other mutations 
have a detrimental effect and can lead to disease. As 
many as 10,000 human genes can cause a genetic dis- 
ease when altered by mutations (see Chapter 10). For 
example, mutations in CFTR can lead to cystic fibro- 
sis, mutations in the dystrophin protein lead to mus- 
cular dystrophy, and mutations in huntingtin lead to 
Huntington’s disease. Using a combination of scien- 
tific approaches including bioinformatics, it is routine 
to analyze and sequence mutant genes to determine if 
the mutant gene is the underlying cause of the disease. 
If the gene mutations do lead to a disease, then the 
sequence information is useful to develop treatments, 
cures, and preventions based on this knowledge. 
Researchers studying a specific human disease often 
rely on the use of genomic information in the human 
genomic map, which is a diagram showing the relative 
positions of every gene indicated along the linear DNA 
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FIGURE 7.17 Transgenic mice show signs of Huntington’s dis- 
ease (HD). (A) Footprints of a normal mouse at age one year. Gray 
represents hind paws; black represents front paws. (B) Footprints 
of a mutant mouse with staggering gait at age one year. As the 
gait worsens, the mutant mice develop clumps of protein in 
the brain. The mice have a long stretch of repetitive DNA in the 
gene that encodes the huntingtin protein, the same mutation that 
occurs in humans with HD. Because the mice share so many sim- 
ilarities with humans, the HD mice can help scientists to under- 
stand the molecular basis for the neurodegenerative mechanism 
in HD. 


molecule contain in each chromosome. Various com- 
puter tools available through NCBI and other databases 
allow researchers to visualize entire chromosomes and 
genomes. NCBI offers a site to “Browse Your Genome” 
(Figure 7.18). To find the fibrosis transmembrane con- 
ductance regulator gene on chromosome 7, click on 
chromosome 7 and view the map of chromosome 7, 
which indicates the location of known genes including 
CFTR. If you click on CFTR, you will have access to a 
large amount of information on this gene. 

The use of bioinformatics to find disease genes 
and to develop preventions, treatments, and cures is 
enhanced by the HapMap Project (see Chapter 10). 
The HapMap Project is a partnership of scientists from 
many countries and funding agencies that work to 
compare the genomic sequences from many different 
individual humans to identify shared regions of genetic 
variation (RFLPs, SNPs) among human chromosomes. 
The HapMap project goal is to identify, and make 
available online, the locations and characteristics of 
approximately 10 million single nucleotide polymor- 
phisms present in the human genome (see Chapter 6). 

The HapMap collection of cataloged SNPs is very 
valuable for research to understand human disease. 
For example, researchers are trying to understand the 
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FIGURE 7.18 Browse your genome. You can browse the human genome at www.ncbi.nlm.nih.gov/genome/guide/human. From this graphical 
view, a researcher can select any chromosome to get additional detailed information about that chromosome. An example of the vast amount of 
data available for chromosome 7. CFTR is circled, and clicking on CFTR allows a researcher access to specific information about the molecule. 


genetic variations in the human genome that increase 
susceptibility to cardiovascular disease in some 
people. The scientists can search the HapMap data- 
bases for SNPs that are shared in the genomes of 
people at risk for (or currently diagnosed with) heart 
disease. Specific SNPs shared in common among 
genomes from at-risk people probably indicates that an 
important gene is located near the SNP. In addition, if 
a researcher had candidate genes that might be impor- 
tant in heart disease, it would make sense to look for 
the presence of SNPs in or around those genes. 


SUMMARY 


Bioinformatics, the marriage of biological data and com- 
puter science, allows scientists to access and analyze 
the huge amount of biological information available 
worldwide. Biological databases are maintained by 
a number of private and government organizations 
and store information about nucleic acid and protein 
sequences, protein structures, genome organization, 
and gene expression. This chapter introduced some 
basic types of data analysis using online resources and 
the tools of bioinformatics. The search for sequence 


similarity usually starts with a simple BLAST search and 
can then begin to address more sophisticated questions 
about protein structure and function and evolution- 
ary relationships among proteins. Although the field 
of bioinformatics is still relatively new, it has already 
enhanced our understanding of basic biological proc- 
esses, it has changed the way we design and carry out 
experiments, and it has the potential to enhance the 
development of medical treatments and advances in all 
areas of biotechnology. 


REVIEW 


To test your knowledge of the chapter’s contents, con- 
sider the following review questions: 


1. Define bioinformatics. 

2. Why is bioinformatics an important tool in the 
study of modern biology? 

3. What types of information can be stored in and 
accessed from biological databases? 

4. What are four main types of information you can 
find in a GenBank record? 

5. Compare and contrast this with the information 
that can be found in UniProt records. 
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6. If you carried out a BLAST search using the query 
ATCGA, your results would indicate no significant 
matches. Explain why this would happen. 

7. You carry out a nucleotide-nucleotide BLAST 
search with a 100-nucleotide sequence and 
receive results indicating no significant matches. 
You use the same sequence to carry out a search 
for a translated protein sequence and find several 
matches. Explain why matches were found by the 
search of the protein database but not from the 
search of the nucleotide database. 

8. The genomes of many model organisms have 
been sequenced, and the information is stored in 
genomic databases. What is a model organism, 
and why is information from model organisms 
useful to understanding human biology? 

9. Define the terms genome, proteome, and 
database. 

10. What does it mean to say that two DNA sequences 
are homologous? 
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The Stranger Within 


New Scientist, November 15, 2003 

By Claire Ainsworth 

Explain this. You are a doctor and one of your patients, 
a 52-year-old woman, comes to see you, very upset. Tests 
have revealed something unbelievable about two of her 
three grownup sons. Although she conceived them natu- 
rally with her husband, who is definitely their father, the 
tests say she isn’t their biological mother. Somehow she 
has given birth to somebody else’s children. This isn’t a trick 
question—it’s a genuine case that Margot Kruskall, a doc- 
tor at the Beth Israel Deaconess Medical Center in Boston, 
MA, was faced with five years ago. The patient, who we 
will call Jane, needed a kidney transplant, and so her family 
underwent blood tests to see if any of them would make a 
suitable donor. When the test results came back, Jane was 
hoping for good news. Instead she received a huge shock. 
The letter told her outright that two of her three sons could 
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not be her biological children. What was going on? It took 
Kruskall and her research team two years to crack the riddle. 
In the end they discovered that Jane is a chimera, a mixture 
of two individuals—nonidentical twin sisters—who fused in 
the womb and grew into a single human body. Some parts 
of Jane’s body were derived from one twin, whereas other 
tissues in Jane were derived from the other twin. It seems 
bizarre that this can happen at all, but Jane’s is not an iso- 
lated case. About 30 similar cases of chimerism have been 
reported, and there are probably many more people who 
are chimeras but who will never discover this fact. 


LOOKING AHEAD 


DNA testing is an extremely powerful tool that has 
been used successfully to investigate many areas of 
scientific research, and to solve problems in biol- 
ogy. If the tests are performed correctly, the results of 
DNA testing almost always point scientists in the right 
direction, even when the DNA test results contradict a 
mother’s biological connection to her child, as in the 
case of the chimera woman called Jane in Boston. 

Different types of DNA tests are appropriate for spe- 
cific applications in forensics to solve crimes, in medi- 
cine to identify the bacteria and viruses responsible for 
human diseases, and in research to help reconstruct 
the evolution of the human race. First this chapter will 
investigate how forensic scientists use DNA testing to 
establish paternity, solve violent crimes, probe bioter- 
ror attacks, trace family relationships, track the spread 
of disease-causing pathogens, and prosecute poachers 
hunting illegally in the wild. The second half of this 
chapter reveals how anthropologists use DNA test- 
ing to identify the origins of the human race and fol- 
low the early migrations of human populations. Upon 
completing this chapter, you should be able to do the 
following: 


e Explain how variability in the human genome 


DNA sequences of different individuals provide the 
molecular basis for forensic DNA testing. 
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e Describe the means by which investigators use 
forensic DNA testing to establish paternity. 

e Describe how investigators use DNA testing to iden- 
tify criminal suspects, and explain how they assess 
the probability of a coincidental match between an 
innocent suspect and the evidence. 

e Explain why DNA testing is considered to be a ver- 
satile tool, and cite examples of situations other 
than violent crimes in which DNA analysis has 
helped solve mysteries. 

e Identify the two types of DNA that are present in 
human cells, and explain the fundamental differ- 
ences between the two genomes. 

e Explain how anthropologists use DNA sequences 
to estimate the time to the most recent common 
ancestor for two populations. 

e Describe which of the three hypotheses (unire- 
gional, parallel evolution, or multiregional) is cur- 
rently the most universally accepted by authorities 
in the field, and explain why. 

e Describe the timeline and the migratory movement 
of the people who first populated the Americas. 


INTRODUCTION 


You only need to glance around a group of people to 
witness the variability of human biology. On any city 
sidewalk you can see the complete spectrum of heights, 
hair colors, body types, and other physical characteris- 
tics of the human race. Like your physical characteris- 
tics, your DNA sequence is unique; even your closest 
relative has a different DNA sequence than you do. In 
the case of identical twins, DNA testing can find dif- 
ferences in the twins’ DNA genome sequences, even 
though identical twins were thought to have identical 
DNA genomes. The genomes of identical twins show 
some regions of variability in DNA sequences that are 
far less variable than the DNA genomes of different 
unrelated individuals. The DNA sequence is especially 
variable at certain locations in the human genome, and 
it is these variable regions of the DNA genomes that 
are used to distinguish the DNA from one person and 
another person, and to determine whether or not they 
are identical twins. 

The results of DNA testing often have consequences 
that profoundly affect people’s lives. Some results reveal 
the identity of biological parents, while other forms of 
DNA testing are used to convict people of crimes that 
carry serious consequences including the death sen- 
tence. DNA testing has been subjected to a great deal 
of scrutiny by the scientific and legal communities 
and is studied considerably more than any of the other 
commonly used forensic investigation techniques. 
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DNA testing and the laboratories that conduct DNA 
testing have improved greatly in the past decade, and 
the system has emerged stronger because of the scru- 
tiny. Modern DNA testing methods are highly reliable, 
and the databases of DNA sequences that are used for 
criminal and anthropological investigations are already 
impressive in size and still growing. Large DNA data- 
bases are important in order to avoid population biases 
that affect DNA testing results. DNA testing can be per- 
formed on any life form, including plants, animals, bac- 
teria, and viruses, making it a versatile tool capable of 
answering a diverse array of perplexing questions. 


FORENSIC DNA TESTING: A POWERFUL 
AND VERSATILE TOOL 


A woman went out on a date with a man she had 
recently met. During the course of the evening, they 
went to his apartment, where he raped her. She tried to 
fight him off and lost one of her contact lenses during 
the struggle. After her assailant let her go, the woman 
went to the police. When the police arrived at the man’s 
apartment to investigate the allegation, the man told the 
police that he had indeed had sex with the woman that 
evening, but that the sex was consensual, and there had 
been no struggle. 

The police noticed that the accused rapist had 
just cleaned his apartment within the past few hours. 
This appeared to be a suspicious coincidence, so the 
police examined the contents of the man’s vacuum 
cleaner bag. They found several broken shards from 
the woman’s contact lens. Forensic analysts were able 
to get enough of the woman’s DNA from the contact 
lens to obtain her DNA profile. This finding corrobo- 
rated her statement that she had been forcibly raped 
and exposed the man’s lie. This shredded contact lens 
confirmed the violence of their interaction that night 
and clearly connected the contact lens to that specific 
woman. The man was convicted of rape. 

There are many such stories that demonstrate the 
amazing power of forensic DNA testing to solve crimes. 
Forensic analysts can produce a DNA profile from a 
fallen hair, a licked envelope, a cigarette butt, and even 
a few broken shards of a contact lens. DNA testing 
has enabled investigators to solve hundreds of cases, 
including decades-old “cold cases” for which they had 
exhausted all leads. DNA testing serves the cause of jus- 
tice as the perfect impartial tool; not only does it have 
the power to implicate the guilty, it also has the power 
to exonerate the innocent. DNA databanks, in which 
the DNA profiles of convicted offenders are stored, 
have helped law enforcement agents identify serial 
criminals, even when their crimes have been com- 
mitted in different states or occurred many years ago. 
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On the other hand, DNA evidence has helped secure the 
release of hundreds of wrongly convicted individuals, 
some of whom had been sentenced to death. 


Capitalizing on the Variability of the 
Human Genome DNA Sequence 


The DNA genome contains human genes that encode 
proteins, but it also contains a lot more than just the 
genes. In fact, only about 2% of the human DNA 
sequence encodes protein-coding genes; the other 
98% of the sequences do not code for proteins (see 
Chapter 6). The protein-coding sequence of a gene 
contains a specific DNA sequence that codes for a 
specific protein. Changes in the DNA sequence coding 
for the protein will change the amino acid sequence of 
the protein, making a mutant product. When a mutant 
gene produces a mutant protein, that does not work 
properly, then the individual might inherit a genetic 
disorder or genetic disease (see Chapter 10). 

The large amount of non-protein-coding DNA (non- 
coding DNA) in the human genomes does not produce 
proteins, but some noncoding DNA regions do have 
functions in the cell. However, changes in, the noncod- 
ing DNA is usually silent in terms of traits; even drastic 
changes in the sequence of the noncoding DNA might 
not affect the individual. These stretches of DNA do 
not encode proteins, so the differences in the noncod- 
ing DNA sequence might not cause differences in the 
individual’s development or health. However, it would 
be incorrect to think that the noncoding DNA in the 
human genome is entirely without function or does 
not carry genetic information. Scientists know much 
less about the characteristics of the noncoding human 
DNA, and relatively little research has been focused on 
noncoding DNA sequences compared to the research 
efforts on the regions that code for proteins (see 
Chapter 6). 

The sequence differences between human genomes 
are located at sites in the genome in both the coding and 
noncoding DNA, where the genomes of humans some- 
times differ in DNA sequence. 

The locations in the DNA sequence where the 
genomes from individuals are different are referred to 
as polymorphisms (from the Latin poly = many and 
morph = form). There are a great many polymorphisms 
in the human DNA genome, and now that the entire 
sequence of the human DNA molecule has been deter- 
mined, these polymorphic changes have been identi- 
fied and the positions located on the chromosomes (see 
Chapter 6). Because the positions of these polymor- 
phisms on their respective chromosomes are known, the 
genetic researchers can use these polymorphisms to map 
the positions of the genes in the genome. This approach 
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has been instrumental in allowing researchers to draw 
maps of genes positioned on the different chromosomes, 
showing the relative locations of different genes on 
each chromosome. The locations of the polymorphisms 
on the human chromosome maps are known, so these 
polymorphisms serve as (and are referred to as) genome 
markers, just like mileposts on a roadmap do. Many 
of these polymorphisms are actually single nucleotide 
polymorphisms (SNPs) (single base pair differences) 
that have become an essential tool for many aspects of 
human genetic analysis (see Chapter 10). 

Human chromosomes are arranged in 23 pairs 
inside most human cells, so that each individual per- 
son has two copies of each gene and two copies of 
each polymorphic marker. The exception to this, of 
course, is that males have one copy of the X chromo- 
some and one copy of the Y chromosome and females 
have two X chromosomes but no Y chromosome. Each 
of the two copies of a gene can exist in more than one 
form depending on the DNA sequence of the gene. 
Each form of the gene is called an allele. The differ- 
ent forms of each polymorphic marker are also alle- 
les of that marker in that individual. For each gene (or 
marker), one allele is located on one chromosome, 
and the second allele lies in the same location on the 
other member of that chromosome pair. Together, the 
two alleles for any gene or marker in an individual’s 
genome are referred to as the individual’s genotype for 
that marker. An individual’s DNA profile consists of 
all the genotypes for all the chromosome markers for 
which data were obtained. 

The standard forensic testing kits usually include 
reagents to detect 11 to 13 chromosome markers in 
the human genome. They often also include reagents 
to detect a polymorphic marker from the amelogenin 
gene, which are located on both the X and Y chromo- 
somes (Figure 8.1). A region of DNA sequence encod- 
ing the amelogenin gene on the Y chromosome copy 
contains six DNA base pairs that are not present in the 
amelogenin gene located on the X chromosome. This 
small difference in DNA sequence between the X and 
Y chromosomes allows investigators to determine the 
sex of the individual who is the source of the evidence 
sample being tested. Forensic scientists test regions 
of DNA sequence referred to as markers that differ 
between individual human genomes. Each individual 
human carries two copies, or alleles, of each gene or 
marker, which constitute the individual’s genotype for 
that marker; all of the markers tested comprise the 
individual’s DNA profile. 

The non-protein-coding human DNA contains a 
lot of repeated DNA sequences, including a type of 
repeated sequence called a short tandem repeat (STR). 
As the name implies, in an STR, a short DNA sequence 
is repeated in tandem, without sequences in between 
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Chromosome 1 Chromosome 2 
MarkerA 9 4 MarkerC 7 7 
Marker B 11 11 MarkerD 13 14 
Marker A: Marker C: 
Alleles 4 and 9 Two copies allele 7 
Genotype 4,9 Genotype 7,7 
Heterozygous Homozygous 
Marker B: Marker D: 
Two copies allele 11 Alleles 13 and 14 
Genotype 11,11 Genotype 13,14 
Homozygous Heterozygous 
Individual's DNA profile: 
MarkerA 4,9 
MarkerB 11,11 
MarkerC 7,7 
MarkerD 13,14 
FIGURE 8.1 Relationships among chromosomes, markers, alleles, 


genotypes, and DNA profiles. Chromosome 1 has two alleles for 
marker A: 9 and 4, which are named for the number of repeats of a 
specific repeated sequence that exists at that position in the chromo- 
some 1 DNA. The genome is heterozygous for the repeat at marker 
A because the alleles have different numbers of specific repeats. 
marker C on chromosome 2 is homozygous at marker C because 
each allele has seven copies of the specific repeated sequence. 


Tetranucleotide repeat: Allele number 3, with three repeats 


. .ATGTGGTTGCATGCATGCATCGCTGAAGGAT... 
... TACACCAACGTACGTACGTAGCGACTTCCTA... 


Tetranucleotide repeat: Allele number 6, with six repeats 


. .ATGTGGTT GCATGCATGCATGCATGCATGCATCGCTGAAGGAT... 
... TACACCAACGTACGTACGTACGTACGTACGTAGCGACTTCCTA... 


FIGURE 8.2 A hypothetical tetranucleotide repeat sequence. The 
hypothetical tetranucleotide repeat sequence shown here exhibits 
the features of a tetranucleotide repeat in the genome, including 
sequences preceding and following the repeat, which are seemingly 
random nucleotide sequences. The variability in the number of tetra- 
nucleotide repeats allows forensic investigators to identify individual 
DNAs from more than one individual; two different individuals will 
have different numbers of repeats of the tetranucleotide sequence. 


the repeats. The type of STR that is used for most foren- 
sic DNA testing is the tetranucleotide repeat, in which 
a four base pair (4 bp) sequence repeat occurs between 
5 and 50 times in the human genome (Figure 8.2). The 
DNA sequences preceding and following the tetra- 
nucleotide repeat are seemingly random sequences. 
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The variability in the nucleotide sequences of a tetra- 
nucleotide repeat allows forensic investigators to distin- 
guish one individual’s DNA from another individual’s 
DNA. Quite simply, the genomes of different individual 
genomes will have different numbers of tetranucleotide 
repeats at some locations. In some people, the tetra- 
nucleotide repeats are so variable that a single indi- 
vidual will have a different number of tetranucleotide 
sequences repeated in each of his or her two alleles. 
When an individual has the same gene or marker allele 
on both copies of the chromosome, the individual is 
said to be homozygous for that marker. If the two alleles 
have a different number of repeats, then the individual 
is heterozygous at that position on the chromosome. 
The different tetranucleotide alleles are named for the 
number of repeats located at a given position on the 
chromosome (locus). For example, if the individual had 
9 repetitions of the repeated sequence at one locus for 
marker A, and 4 repetitions of the repeated sequence 
at the other locus for marker A, that individual's 
genotype for marker A would be 9,4 (see Figure 8.1). 


Tetranucleotide repeat markers are commonly used in 
forensics DNA testing for most criminal investigations. This 
marker is a stretch of DNA in which a 4bp sequence is 
repeated between 5 and 50 times; the different alleles are 
identified by the number of 4 base pair repeats at that site. 


Forensic analysts use the polymerase chain reaction 
(PCR) (see Chapter 5) to obtain a DNA profile using 
field evidence or a suspect’s reference sample. Even 
tiny amounts of DNA can be amplified by the PCR 
method to produce millions of copies of a predeter- 
mined DNA sequence specified by the analyst. In the 
case of an STR test, the analyst chooses specific DNA 
primers that allow the amplification of the tetranucle- 
otide repeat DNA plus some of the sequence flanking it 
on either side. Because the sequence flanking the repeat 
does not vary from one person to another, the size of 
the PCR DNA product will directly reflect the number 
of repetitions of the repeated sequence that exist at that 
particular locus. For example, if a specific allele con- 
taining 10 repetitions of a tetranucleotide repeat yields 
a PCR product that is 200 base pairs in size, a differ- 
ent version of that allele containing 15 repetitions of the 
4bp sequence will yield a PCR product that is 220bp 
in size because it contains an extra 20 nucleotides (5 
more repetitions of a 4bp sequence). By determining 
the lengths of the PCR DNA products obtained from a 
sample, the forensic analyst can figure out how many 
repeated sequences exist in each of the individual's 
alleles for each of the markers tested. This is part of the 
information that the forensic analyst uses to create an 
individual’s DNA profile. 
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TABLE 8.1 Four-marker DNA profile test determines paternity 


Child Mother Father 1 Father 2 Father 3 
Marker 1 21,30 30,36 31,34 21,29 21,27 
Marker 2 4,10 49 6,15 10,11 8,13 
Marker 3 16,24 11,24 8,10 16,20 6,22 
Marker 4 6,12 12,16 6,10 6,14 7,10 


DNA analysts use PCR to amplify regions of the genome 
DNA that contain tetranucleotide repeats. The length of the 
PCR products for each genome marker indicates the number 
of 4bp repeats in each of the individual’s two alleles. 


Using DNA Testing to Establish Paternity 


Individuals inherit one chromosome from each chro- 
mosome pair from the mother and one chromosome 
pair from the father (see Chapter 6). Therefore, for any 
of the polymorphic markers in the genome, each indi- 
vidual inherits one allele from the mother and one 
allele from the father. A paternity DNA test can eas- 
ily establish the identity of a child’s biological father 
by determining and comparing the DNA profiles of 
the mother, the father, and the child to examine which 
alleles the child inherited from the biological father. 
Consider the following hypothetical example using 
the genetic information provided in Table 8.1. A four- 
marker DNA profile is shown for the child, the mother, 
and the three men who might be the child’s biological 
father. The child has inherited the number 30 allele of 
marker 1, the number 4 allele of marker 2, the number 
24 allele of marker 3, and the number 12 allele of marker 
4 from his or her mother. The father, therefore, has con- 
tributed the number 21 allele of marker 1, the number 
10 allele of marker 2, the number 16 allele of marker 
3, and the number 6 allele of marker 4 to the child. As 
Table 8.1 shows, although potential fathers 1 and 3 each 
possess one or two of the alleles the child inherited from 
his or her father, only father 2 possesses all four alleles 
that the child inherited from his or her father. Father 2 is 
therefore identified as the biological father of the child. 
Paternity tests often involve testing the variable- 
number tandem repeats (VNTRs, pronounced “vinters”) 
that can affect coding as well as noncoding DNA 
sequences in the human genome. A VNTR occurs when 
the number of DNA base pairs located between two fixed 
positions on a DNA molecule varies between genomes 
due to the presence or absence of a different number of 
DNA repeats. The different VNTRs are distributed in the 
genome in unique patterns that are specific to each indi- 
vidual human genome. DNA fingerprinting technology 
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FIGURE 8.3 The results of a DNA fingerprinting analysis using 
the 33.1 VNTR DNA probe. The lanes contain (M) mother DNA, 
(C) child DNA, (F1) DNA from possible father #1, and (F2) DNA from 
possible father #2. Comparison of the DNA bands in the different 
lanes indicates that the child's biological father is father #2. 


is based on the detection of these types of DNA differ- 
ences that make human genomes unique (Figure 8.3) 
(see Chapter 10). 


To establish paternity, the DNA analyst determines the DNA 
profile of the child and mother, then “subtracts” the alleles 
the child inherited from the mother to reveal the alleles that 
the child inherited from the biological father. 


Using DNA Testing to Identify Criminal 
Suspects 


Any time two people make physical contact with each 
other, some biological material is transferred from one 
person to another. The material may range from a stray 
hair falling on the coat of the person in the next seat at 
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TABLE 8.2 DNA test results from a rape case 
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Vaginal swab Fingernails Victim Suspect 1 Suspect 2 Suspect 3 
Marker 1 19,22,24,26 22,24 19,26 19,29 20,26 22,24 
Marker 2 4,6,9 6,9 4,6 5,8 7,11 6,9 
Marker 3 13,22,24,27 13,24 22,27 15,17 14,19 13,24 
Marker 4 8,10,11,13 8,10 11,13 9,11 9,12 8,10 


lunch to semen that a rapist has deposited in the vagina 
of his victim. The people who perpetrate crimes often 
leave biological materials at the crime scene, challeng- 
ing forensic investigators to extract the perpetrator’s DNA 
from the evidence and identify the DNA profile. 

In this example, forensic investigators are investi- 
gating an alleged rape with three reasonable suspects. 
The vaginal swab samples from the victim, as well as 
scrapings of skin from under her fingernails, are avail- 
able for testing. The results of these DNA tests are 
presented in Table 8.2, along with reference blood 
samples from the victim and the three suspects. 

The vaginal swab sample contains a mixture of 
DNA from the cells lining the victim’s vagina and the 
rapist’s sperm cells, and it therefore contains both of 
their DNA profiles. You can see that, for each marker, 
the alleles that were found in the vaginal swab sample 
represent the alleles possessed by the victim plus the 
alleles possessed by suspect 3. These findings clearly 
implicate suspect 3 as the rapist. Suspects 1 and 2 are 
exonerated, because they each possess alleles that 
are not present in the vaginal swab sample. In addi- 
tion, the DNA profile of the skin that was scraped from 
under the victim’s fingernails clearly matches the DNA 
profile of suspect 3 at all four markers tested. 


To implicate a suspect as the source of crime scene evidence, 
the DNA analyst compares the DNA profiles of the suspect 
and the evidence sample. Some evidence samples contain 
material from the perpetrator only, whereas others contain a 
mixture of DNA from the perpetrator and the victim. 


The preceding example postulated a crime for which 
there were three probable suspects, but in real life, there 
are usually no identifiable suspects, just the perpetrator’s 
DNA in evidence left at the crime scene. The nation’s net- 
work of forensic DNA databanks is one of the most pow- 
erful tools used by law enforcement agents to help solve 
crimes. The databanks contain the DNA profiles of peo- 
ple who have been convicted of crimes, as well as DNA 
profiles from evidence samples collected at unsolved 
crimes (Figure 8.4). The Federal Bureau of Investigation’s 
Combined DNA Index System (CODIS) enables local, 


state, and federal law enforcement officers to search the 
forensic DNA databanks of participating law enforcement 
agencies across the country. Investigators can search 
for DNA profiles from previously convicted offenders 
that match DNA evidence from crimes currently under 
investigation or for evidence that the perpetrator of a 
solved crime has also committed past unsolved crimes. 
By law, previously convicted individuals do not enjoy 
the same level of privacy protection as people without 
criminal records. As a result, once an individual’s DNA 
profile has been entered in a forensic database, law 
enforcement agents can include the individual’s DNA 
profile to investigate the possibility that the individual 
was involved in another crime, without needing addi- 
tional evidence that gives them probable cause to believe 
that the individual might have been involved in the sec- 
ond crime. 

These DNA databanks have been used to suc- 
cessfully solve hundreds of crimes, including many 
“cold hits” or “cold cases” for which investigators had 
exhausted their leads. The success of these databanks 
has prompted law enforcement agents to support the 
idea of collecting DNA samples from people convicted 
of nonviolent crimes, or possibly even from people who 
were arrested but not convicted of certain crimes. Data 
from states such as Virginia, which collects DNA from 
people convicted of certain nonviolent crimes as well 
as from people convicted of violent crimes, reveal that 
about half of the violent crimes were solved using DNA 
databanks with DNA from nonviolent offenders. Some 
public officials, including the former mayor of New 
York City and 2008 Republican presidential candidate 
Rudy Giuliani, took a more extreme position and advo- 
cated collecting DNA profiles from all U.S. citizens to 
include in law enforcement databanks. This could be 
accomplished if DNA profile testing were included 
among the blood tests already performed on newborn 
babies to test for a number of metabolic disorders and 
genetic diseases (see Chapter 10). With time, these 
databanks would obviously provide law enforcement 
officers with a powerful tool that would enable them to 
identify many violent offenders as soon as they commit- 
ted a crime. However, this proposal and the long-term 
storage of personal DNA information obviously raise 
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FIGURE 8.4 Results of a forensic DNA test: a DNA profile. The DNA profile of a sample is shown illustrating the results from a forensic 
DNA test. The sample’s DNA profile contains the following genotypes (start at top panel and read left to right): 


D8S1179—13,13 D21S11—30,30 D7S820—10,11 


CSF1PO—10,12 _D3S1358—14,15 TH01—8,9.3 
D13S317—13,13 D16S539—11,12 D281338—19,23 
D19S433—14,15 vWA—17,18 TPOX—8,8 
D18S51—15,19 D5S818—11,11  FGA—23,24 


serious ethical issues, and some people suggest that 
storing the DNA profiles of people who are not con- 
victed offenders constitutes an illegal invasion of pri- 
vacy. Others argue that collecting DNA from the public 
for the purpose of storing DNA profiles in the databank 
is fair and unbiased as long as no individual or group is 
singled out for inclusion in the database. 


A DNA Match Does Not Prove That a 
Suspect Is Guilty 
A match between the DNA profile derived from 


an evidence sample and that of a suspect does not 
automatically prove that the suspect is the perpetrator 


of the crime. There are several reasons why an inno- 
cent person’s DNA profile might match the DNA pro- 
file of evidence found at a crime scene. A reported 
match may not be a true match and should be checked 
for human errors such as mislabeled samples, degraded 
DNA in a sample may produce an inaccurate DNA 
profile, or unscrupulous investigators may have tam- 
pered with the evidence. It is also possible that the sus- 
pect’s sample could have been left at the crime scene 
innocently before or after the crime was committed. 
Sometimes a DNA match is coincidental because 
the crime may have been committed by a close relative 
whose initial DNA profile, using a limited number of 
markers, is the same as the defendant's. Forensic DNA 
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testing typically analyzes only a few genetic markers 
at first, which makes it possible to obtain what appear 
to be identical DNA profiles for two close relatives. 
However, DNA testing can easily distinguish the DNA 
from close relatives when the DNA testing involves 
enough genetic markers. 

For the prosecutors in a legal case to prove beyond a 
reasonable doubt that they have identified the perpetra- 
tor of the crime, the investigators usually rely not only 
on the DNA evidence but also on witness testimony, the 
circumstantial evidence surrounding the crime, alibis, 
and other evidence that should either confirm or deny 
the suspect's involvement in the crime. The lawyers and 
investigators work to find and fit together the pieces of 
the puzzle before they can conclude with confidence 
that the perpetrator of the crime has been correctly 
identified. Even if the suspect’s DNA profile matches 
that of the evidence perfectly, if the suspect has an alibi 
that clearly establishes that he or she was elsewhere 
when the crime was committed, it is important for the 
legal system to seek an explanation for the DNA match 
other than the suspect having committed the crime. 


A DNA match is strong evidence but alone does not prove a 
suspect is guilty beyond a reasonable doubt. There are many 
possible explanations for a coincidental match between 
an innocent suspect and the evidence, or there may be an 
innocent explanation for the suspect's sample being at the 
crime scene. 


One of the most difficult issues surrounding the 
use of forensic DNA testing evidence in criminal trials 
involves the question of a coincidental match. Although 
it is highly likely that a given suspect DNA profile is 
unique in the human race, this is essentially a statistical 
argument unless in a specific case the forensic analysts 
have a database that includes the DNA profile of every 
person in the suspect pool. Without such a database, the 
analyst cannot definitively state that nobody else in the 
population has the same DNA profile as the suspect. In 
most cases the analyst must determine the probability 
that the same genotype will be selected at random from 
the reference population in the database. 

The American legal system assumes the suspect is 
innocent until proven guilty. When the DNA profile of 
the suspect matches that of the evidence, one is obli- 
gated to consider the possibility that the suspect is 
innocent but coincidentally has the same DNA profile 
as the true perpetrator. If there is no witness or other 
evidence that can narrow down the list of possible 
suspects, the possible suspect pool is usually large. 
Investigators may consider the population of the city, 
the county, or even the country in which the crime 
was committed. The reasonable suspect pool helps to 
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limit the size of the database to screen for matches to 
the suspect’s DNA profile. The larger the reasonable 
suspect pool, the better the chance that the investiga- 
tor might find an innocent person whose DNA profile 
matches that of the actual perpetrator. 

The probability of a coincidental match is usually 
referred to as the random match probability (RMP), 
which represents the probability that the DNA profile in 
question would be found in a person who was randomly 
selected from the same racial/ethnic group to which the 
defendant belongs. This is a difficult concept for jurors, 
lawyers, judges, and even expert witnesses to compre- 
hend correctly. Many people, including expert witnesses, 
have committed logical fallacies or made misstatements 
in court regarding the meaning of the RMP. 

To calculate the RMP for the DNA profiles for the 
suspect and the evidence, analysts must determine 
how common the suspect's marker genotypes are in 
the general population. Law enforcement agencies 
have collected DNA profiles from many people who 
were not involved in crimes but who volunteered to 
give their DNA to law enforcement agencies to deter- 
mine the frequencies of common marker alleles and 
genotypes in the general population. These reference 
databases enable the analyst to determine how fre- 
quently the different genotypes for the different mark- 
ers appear in the general population, or in other words 
what the probability is that one might find this same 
genotype in an individual who was selected at random 
from the reference population in the database. 

Because some genotypes are more commonly 
found in one racial or ethnic group than in others, most 
forensic analysts maintain separate reference databases 
for DNA profiles of Caucasians, African Americans, 
Southeastern Hispanics, Southwestern Hispanics, Asians, 
and different Native American tribes. The reference 
database that most closely matches the suspect’s eth- 
nic heritage will be analyzed first to determine the fre- 
quencies of the genotypes of the DNA profile in the 
general population. Then the analyst applies the prod- 
uct rule of probability theory to calculate the RMP. The 
product rule states that the probability of a series of 
independent events happening is equal to the product 
of the probabilities of the individual events. For exam- 
ple, if the probability of an event is 20% (0.2), then the 
probability that a series of three independent events 
will happen is equal to the product of the probabili- 
ties of the individual events (0.2 X 0.2 X 0.2 = 0.8). 
In a forensic investigation, the analyst determines the 
probability of finding each of the marker genotypes for 
each DNA profile in the reference population, then the 
analyst multiplies the probabilities of the individual 
genotypes together to determine the overall probability 
of finding that DNA profile in an individual who was 
randomly selected from that reference population. 
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The more genetic markers that are available for the 
forensic analyst to evaluate, the smaller the RMP will 
be. The random match probability is the likelihood that 
a person who was randomly selected from the defend- 
ant’s racial/ethnic group would coincidentally have the 
same DNA profile as the DNA profile of the defend- 
ant and the DNA profile derived from the DNA in the 
evidence. In a case involving a degraded DNA sample, 
reliable results are obtained from only a few markers, 
which increase the possibility of a coincidental match. 
In most cases, however, if data from nine or more 
genetic markers are available, the RMP is infinitesi- 
mal—possibly as low as 1 in 1 quintillion (1 in 1,000, 
000,000,000,000,000 or 0.000000000000000001). To 
appreciate how large a number 1 quintillion is, consider 
the fact that if a person counted to 1 quintillion at a rate 
of one number per second, it would take approximately 
31,700,000,000 (31 billion, 700 million) years. 


When a DNA profile match is presented in court, the RMP 
for that DNA profile is included. The RMP represents the 
probability that the DNA profile would be found in a random 
person from the racial/ethnic group to which the defendant 
belongs. The smaller the RMP number, the more likely it is 
that the defendant is the source of the DNA evidence. 
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Additional Applications of Forensic 
DNA Testing 


Forensic DNA testing has emerged as a versatile tool 
that can be applied to many situations other than the 
investigation of violent crimes. Consider the need to 
identify the human remains of military casualties or vic- 
tims of mass disasters such as the explosion of Airline 
Flight 800 over Long Island in 1996 or the attack on 
the World Trade Center on September 11, 2001. In 
cases where investigators do not have access to teeth 
to conduct a dental identification, such as in mass dis- 
asters, investigators can use DNA testing to identify 
the victims. In these cases, the concept of the RMP 
does not apply because the pool of people to whom 
the recovered remains might belong is limited to the 
servicemen and servicewomen who were involved in 
the battle, the people who were on the airplane, or the 
people who were in the World Trade Center at the time 
of the attack. In the future, DNA testing will ensure 
that all military casualties will be positively identified. 
The remains that lay unidentified for years in the Tomb 
of the Unknown Soldier in Arlington, Virginia, were 
finally positively identified as belonging to an Air Force 
pilot who was shot down over Vietnam in 1972. 

DNA testing was also used to establish family rela- 
tionships in a number of high-profile cases, as well 


Box 8.1 DNA Testing and the Anthrax Letters 


DNA testing was very important to the investigation to find 
the terrorist who sent letters containing Bacillus anthracis 
anthrax spores to senators Tom Daschle (D-South Dakota) 
and Patrick Leahy (D-Vermont) as well as to the New York 
Post and NBC News. Anthrax bacteria can cause a serious 
disease in humans and livestock; the letter bioattack in 2001 
killed five people. There are a few research laboratories in the 
United States that study anthrax, so it was important for inves- 
tigators to determine the source of the anthrax. Like all DNA 
genomes, the anthrax genome DNA mutates and changes 
over time, so when someone moved an anthrax sample from 
one lab to another, it might be possible to trace the movement 
of the anthrax and eventually identify the lab source of the 
anthrax spores. 

The Federal Bureau of Investigation (FBI) was convinced 
that Steven Hatfill, a biodefense researcher in the government 
labs at Fort Detrick, Maryland, was the anthrax bioterrorist, 
but Hatfill maintained his innocence and eventually filed a 
lawsuit, claiming invasion of privacy and harassment. By the 
summer of 2008, as the Justice Department finally settled 
Hatfill’s suit for nearly $6 million, the FBI had already shifted 
to focus on U.S. Army researcher Bruce Ivins as the main 
anthrax suspect. After Hatfill was exonerated, Ivins committed 
suicide, raising new questions about lvins’s connection to the 
deadly anthrax letters. In August 2008, the FBI reported that 
they had used DNA testing and electron microscopy to study 


the anthrax spores from the letters and they had found a link 
to anthrax cultures used by Ivins. 

The Center for Disease Control discovered that the original 
lab “stock” of anthrax bacteria that was used to generate the 
anthrax spores contained a small number of anthrax mutants 
that grew into colonies with varied textures, colors, and sizes 
compared to the wildtype anthrax bacteria. Investigators 
wanted to follow the anthrax spores from the lab source to 
the contaminated letters by tracing the changes in the DNA 
sequences of the anthrax genomes. To identify anthrax cells car- 
rying the rare genome mutations, researchers spread the spores 
on to many dishes containing solid media in the lab so that 
each bacterial spore could germinate and grow into a colony of 
bacterial cells. The rare mutant spores in the population formed 
colonies that appeared to be very different from the major- 
ity of the wildtype anthrax colonies, making it easy to distin- 
guish the mutant anthrax colonies from the wildtype colonies. 
Scientists sequenced the genome DNA from the rare anthrax 
mutants and compared the mutant DNA sequences with DNA 
genome sequences from the initial lab strain of Bacillus anthra- 
cis. This study identified four rare DNA mutations in the anthrax 
genome that are passed on from one generation of anthrax cells 
to the next. The genomes from the lab stock of anthrax spores 
accessed by Ivins contained the same four rare mutations, a 
potentially important link between the anthrax spores in the 
attack letters and the anthrax stock in Ivins’s lab. 
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FIGURE 8.5 A mitochondrion and the mitochondrial DNA genome (mtDNA). (A) This electron micrograph shows the highly folded internal 
membranes inside the mitochondrion. (B) This electron micrograph shows the circular mtDNA undergoing DNA replication. 


as thousands of other less publicized cases. A well- 
known example involves the claims by several women 
that they were the Princess Anastasia, daughter of Tsar 
Nicholas II and Tsarina Alexandra of the Romanov fam- 
ily. The Romanovs were executed during the Bolshevik 
revolution, and their remains were dumped into a mass 
grave near Yekatarinaburg. Forensic anthropologists 
and DNA analysts have identified the remains of the 
tsar, tsarina, and several of their children, but none of 
the remains could be positively identified as belonging 
to their daughter Anastasia. Through the years, several 
women have come forward claiming to be Anastasia 
and therefore entitled to the Romanov family fortune. 
Their claims can be tested, however, by comparing 
their mitochondrial DNA (mtDNA) sequences to that of 
Prince Philip, husband of the present Queen Elizabeth 
Il of England and grand-nephew of Tsarina Alexandra. 

There are two major types of DNA in human 
cells, nuclear DNA (nDNA) and mitochondrial DNA 
(mtDNA) (Figure 8.5). In human cells, the nuclear DNA 
encompasses all 23 or 46 human chromosomes, and 
each human chromosome contains a linear double- 
stranded DNA molecule that extends from one end 
of each chromosome to the other end of the chromo- 
some. In contrast, the mitochondrial DNA (mtDNA) 
is a double-stranded circular DNA molecule that con- 
tains only 37 genes; it is located in the mitochondrion, 
a specialized organelle in the cytoplasm that generates 
energy in the cell. The enzymes in the mitochondrion 
catalyze the biochemical reactions that allow the cells 
in the human body to harvest energy from the carbo- 
hydrates and fats in the diet. 

During fertilization, the sperm contributes lit- 
tle more than its DNA. The rest of the material used 


to create the child, including the mitochondria and 
the mtDNA, is stored in the egg. The mtDNA is there- 
fore inherited solely from the mother to the child (son 
or daughter). People who are related through a line of 
female relatives will all carry the same mtDNA chro- 
mosome sequence. This was indeed the case for 
Princess Anastasia and Prince Philip, who were con- 
nected through a line of female relatives. The authentic 
Anastasia will have mtDNA sequences that will be iden- 
tical to the mtDNA carried by Prince Philip (Figure 8.6). 

Another well-publicized case involving DNA test- 
ing to determine family relationships tested the idea 
that Thomas Jefferson had biological children with his 
black slave Sally Hemings. This was certainly plausible 
since it is well known that slave owners in the South 
had children with some of their slaves. The descend- 
ants of a man named Eston Hemings had claimed for 
generations that Eston Hemings was the biological son 
of Thomas Jefferson and Sally Hemings. In fact, the 
descendants of a man named Thomas Woodson also 
claimed that he was the son of Thomas Jefferson and 
a slave. It was appropriate to use Y chromosome DNA 
testing in this case, because the individual in question 
was thought to be related to Thomas Jefferson by a line 
of male relatives. Jefferson’s only son by his wife died 
in childhood. It was possible to trace the Y chromo- 
some using a DNA sample obtained from a man who 
had descended from Jefferson’s uncle Field Jefferson 
(Jefferson’s father’s brother) by an uninterrupted line of 
males. 

DNA testing revealed that the Y chromosome 
sequences in the descendant of Eston Hemings were 
identical to those that were found in the descendant of 
Field Jefferson. This suggested that Eston Hemings was 
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FIGURE 8.6 Pedigree shows the relationship between the Princess 
Anastasia and Prince Philip. Princess Anastasia and Prince Philip are 
connected by a line of female relatives (circles), which means that 
their mtDNA sequences should match. 


in fact the biological child of Thomas Jefferson and 
Sally Hemings. Because the Y chromosome sequence 
is shared by all men who can be connected by male 
relatives, however, this does not definitely prove that 
Thomas Jefferson was Eston Hemings’s father. Jefferson’s 
brother Randolph Jefferson was a frequent visitor at 
Monticello and also had interactions with the slaves 
there. There is a large body of historical and anecdo- 
tal evidence that indicates that Thomas Jefferson had 
an extraordinary fondness for Sally Hemings, however, 
and he is most likely to have been the father of Eston 
Hemings. Conversely, the DNA testing also disproved 
the claims of the descendants of Thomas Woodson. 

In addition to criminal investigations, DNA test- 
ing is also used for biomedical investigations and is 
a common tool in basic DNA research. Like all DNA 
genomes, viral DNA genomes mutate, so that the spe- 
cific interactions between DNA probes and the altered 
sequences can be used to identify differences in the 
sequences of the human immunodeficiency virus (HIV) 
genomes that infect different people. Investigators use 
the differences in genome sequences to determine the 
source of the virus in a specific infected patient. In 
1992, the analysis of HIV DNA sequences produced 
results that implicated a Florida dentist as the source 
of HIV infections in six of his patients. The dentist was 
believed to have transmitted the virus inadvertently, 
however, and was not convicted of a crime. On the 
other hand, in 1997, a woman accused a physician of 
trying to kill her by injecting her with an HIV-infected 
patient's blood after she ended their romantic relation- 
ship. DNA testing results showed a match between 
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the strain of HIV that infected the woman and the HIV 
strain carried by one of the physician’s patients. The 
physician was convicted. 

Medical investigators also use DNA testing to iden- 
tify specific pathogens that cause epidemics in a popu- 
lation. In 1994, a viral epidemic broke out in the “four 
corners” region of the United States where the borders 
of Arizona, Colorado, New Mexico, and Utah converge, 
and it spread quickly among the Native Americans liv- 
ing in the region, killing 35 people. Subsequent DNA 
testing determined that the little-known hantavirus 
(originally discovered in the Hantaan river region in 
Korea) was responsible for this epidemic. 

DNA testing can also reveal important information 
about the mechanism that is used to spread viral infec- 
tions. The famous 1918 influenza epidemic was the 
most virulent epidemic of all time, killing approximately 
30 million people worldwide. Scientists used DNA 
investigations to identify the specific strain of influ- 
enza virus that was responsible for the 1918 infections. 
They also demonstrated that the virus could infect pigs 
and inside the pig host the virus acquired characteris- 
tics that enabled the virus to infect humans. A similar 
investigation revealed that the avian flu virus that 
affected a number of people in Hong Kong in 1997 
had been transmitted through bird hosts. These inves- 
tigations and many more have revealed important 
insights regarding the mechanism used by these viruses 
to make people sick, and revealed information about 
the mechanisms used to spread the viruses. The HIN1 
swine flu virus genome contains human, pig, and avian 
sequences and caused a pandemic in 2009. 

DNA testing has also answered questions regard- 
ing the spread of infectious diseases by early North 
American explorers. For example, many people think 
that the explorers who came to the New World in 
the 1500s brought the tuberculosis bacterium with 
them from Europe. In 1994, however, a research team 
(University of Minnesota) recovered the DNA from 
Mycobacterium tuberculosis (Figure 8.7), the organism 
that causes tuberculosis, which was found in the pre- 
served lung tissue of a Peruvian mummy from the year 
A.D. 1000. This discovery proved conclusively that the 
tuberculosis disease already existed in the Americas 
many years before the European explorers arrived in 
the sixteenth century. 

In addition to using forensic DNA testing to investi- 
gate homicides, forest rangers and game wardens have 
begun to depend on DNA testing to identify and pros- 
ecute poachers who hunt illegally. Investigators deter- 
mine the specific DNA profile of an illegally killed 
animal from the carcass, then they match that DNA 
profile to the meat stored in someone's freezer or to 
the trophy animal head hung on a wall. DNA analy- 
sis is an excellent way to ensure that the meat being 
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sold for human consumption actually originated from 
the expected animals and not from unauthorized 
sources. Investigators now use DNA testing to identify 
merchants that sell meat from endangered humpback 
whales or who market disks of skate meat as scallops. 


FIGURE 8.7 Mycobacterium tuberculosis. This organism causes 
tuberculosis (TB). 
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DNA testing is a powerful and versatile tool that can be 
used to solve many different types of crimes, track the 
spread of disease-causing pathogens, determine paternity 
and identify the original source of animal food items. 


USING DNA ANALYSIS TO RECONSTRUCT 
THE ORIGINS OF THE HUMAN RACE 


Mitochondrial DNA Enables Researchers to 
Analyze Ancient Specimens 


The circular mitochondrial DNA genome contains very 
little noncoding DNA in comparison to the large 
amount of noncoding sequences in the nuclear genome 
(see Figure 8.5). From the standpoint of DNA testing, 
this means that there are few locations in the mtDNA 
where changes in the DNA sequence can be tolerated 
without seriously altering the function of the mitochon- 
drion. The nuclear genome is the DNA of choice for 
forensic testing, but mtDNA is often most appropriate 
for tracing maternal inheritance (from the mother) and 
for use in anthropological research. The mtDNA offers 
many fewer polymorphic sites that the forensic analyst 
can use to identify the source of crime scene evidence. 
Mitochondrial DNA undergoes mutations at a higher 


Box 8.2 History’s Most Virulent Epidemic Explained. 1918: Could It Happen Again? 


There have been numerous epidemics during the course of 
human history, but none have involved a disease that was 
more virulent than the 1918 influenza epidemic. Victims 
initially presented with the typical flulike symptoms—fever, 
aches, chills, and a cough—but unlike most influenza infec- 
tions, which run their course in a week or two, this one often 
killed its victim within a week after the victim first presented 
symptoms of the infection. Approximately 700,000 people 
died in the United States, and between 20 and 30 million 
people died worldwide. 

Each time researchers analyze the DNA or RNA genomes 
of different viruses and study the proteins encoded by these 
infectious agents, they gain valuable insights about the mecha- 
nism by which an infectious agent causes a particular disease, 
as well as more about how infectious diseases spread from 
individual to individual. These insights then guide the devel- 
opment of research strategies to prevent and treat the disease, 
as well as to prevent the spread of the disease in the event of 
another outbreak. 

Identifying the virus that causes an outbreak can be difficult 
because the immune system often eradicates the virus quickly 
from the body, but the patient is vulnerable to other pathogens. 
This was the case for the influenza epidemic in 1918, when 
the influenza victims often died of a secondary infection of 


bacterial pneumonia. If the investigators could have performed 
DNA tests in 1918, they might have had a hard time obtain- 
ing the DNA evidence of the virus that caused the disease, 
because there would be little of the virus left in the body but 
a considerable amount of bacteria in the lungs. Any attempt 
to obtain the DNA or RNA from the virus that caused the epi- 
demic is hampered by the fact that there is far more bacterial 
DNA present in the infected tissues than viral genome. 

Lung specimens from 70 victims of the 1918 influenza 
epidemic had been stored at the Armed Forces Institute of 
Pathology (AFIP) in Washington, D.C. Seven of these victims 
were reported to have died quickly, thereby raising the possi- 
bility that viral DNA could be recovered from these lung sam- 
ples. Although researchers were unable to obtain viral DNA 
from six of the victims, a team led by Dr. Jeffrey Taubenberger 
was able to isolate fragments of viral RNA from one of the 
seven specimens. These findings suggested that they had 
found a novel virus, previously undiscovered, that was related 
to the better-known swine flu virus. 

Researchers are now trying to piece together the entire 
sequence of the novel virus RNA genome, hoping to discover 
which of the virus genes endow it with tremendous virulence. 
Researchers will use this information to do discover to develop 
an effective vaccine to protect people against this virus. 
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rate than nuclear DNA, which means that mtDNA pro- 
vides a more powerful tool for use in DNA studies that 
track changes in DNA sequences over time. Whereas 
forensic analysts often compare two DNA samples to 
look for a match with a specific individual, anthro- 
pologists often need to track the changes in the DNA 
sequence of a population over long periods of time. 

Two advantages offered by mtDNA include durabil- 
ity and abundance. The biggest obstacle encountered 
when analyzing DNA from ancient specimens is that 
DNA molecules degrade over time. The linear DNA 
molecules residing in chromosomes in the nucleus are 
much less stable than the mtDNA molecules because 
the circular nature of the mtDNA protects the mtDNA 
from some enzymes that degrade DNA. In addition, 
there are only two copies of each chromosome in the 
nucleus, but a single mitochondrion can contain more 
than 10 copies of the circular mtDNA molecule. There 
are up to several hundred mitochondrial organelles 
in a single cell, so there may be thousands of mtDNA 
copies in a single eukaryotic cell. The superior abun- 
dance and durability of mtDNA, coupled with the 
higher mutation rate, makes mtDNA an optimal choice 
for investigations involving ancient specimens. 


mtDNA contains fewer polymorphisms than the nuclear 
genome DNA, and is very useful for maternal inheritance 
studies. The mtDNA is much more abundant and dura- 
ble than nuclear DNA and has a higher mutation rate, all 
advantages that make mtDNA an ideal choice for studying 
ancient specimens. 


Tracking Human Migration by Tracking 
DNA Sequence Mutations 


Genomic research has provided estimates of muta- 
tion rates for mitochondrial DNA genomes and for 
nuclear chromosome sequences. Any DNA sequence 
with a known mutation rate can serve as a “molecular 
clock” that anthropologists use to determine how far 
back in time an investigator must go to find a common 
ancestor for people from different geographic regions. 
Scientists can determine the number of differences in 
the DNA sequences between the genomes of two pop- 
ulations and use the mutation rate of that sequence to 
calculate the time to the most recent common ances- 
tor (TMRCA) for the two populations. Anthropological 
geneticists compare the DNA sequences recovered from 
fossilized human remains or from people who currently 
live in different parts of the world to the DNA genome 
of our closest evolutionary primate ancestor, the chim- 
panzee. The DNA sequences derived from the fossil- 
ized specimens of the evolutionarily older hominids 
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(hominids = human-like two-footed primates) are more 
similar to the chimpanzee DNA genome sequence 
than to the evolutionarily newer present-day human 
genome sequences. In addition, present-day humans 
contain some of these evolutionarily older sequences 
in their DNA, along with sequences that have arisen in 
the genome DNA more recently. 

Calculating the TMRCA for different population 
groups helps anthropologists to reconstruct the migra- 
tions of early humans (Figure 8.8). In the figure, the 
circles marked with letters A to D represent different 
groups of people, living in different geographic regions. 
In this scenario, the people from whom the present-day 
groups A and D are descended were part of the same 
population. Sometime in the distant past, two groups 
of people split off from that population, one settling 
in region A and one settling in region D. Sometime 
later, two groups of people split off from group A. One 
migrated to region B, whereas the other migrated to 
region C. 

When people first populated regions A and D, they 
had similar DNA sequences because they came from 
the same original population. However, each time a 
sperm or egg cell is made, the DNA is replicated, and 
mistakes occur during the process of DNA replication, 
altering the new DNA copy at one or more sequences. 
As a single family expands through several generations, 
the DNA sequences of the younger generations will dif- 
fer from the DNA sequences of the older generations. 
The farther away an individual is from the original gen- 
eration, the more differences in DNA sequences exist 
between the genomes of the current generation and 
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FIGURE 8.8 TMRCA for different population groups helps to recon- 
struct early human migrations. This diagram illustrates a hypothetical 
scenario in which two population groups (A and D) diverged from 
a common population. Later, groups B and C diverged from group 
A. The letters A-D represent different groups of people, living in dif- 
ferent geographic regions. The people from whom the present-day 
groups A and D are descended were at one time part of the same 
population. Sometime in the distant past, two groups of people split 
off from that population, one settling in region A and one settling in 
region D. At a later time, two groups of people split off from group 
A, one migrated to region B, and the other migrated to region C. 


186 


those of the family members who originally settled 
the area. Over evolutionary time, the DNA sequences 
of the people who live in regions A and D became 
slightly different. 

When groups B and C first split from group A and 
populated their respective geographic regions, they 
had similar DNA sequences. With time, however, 
the DNA sequences of people from regions B and C 
became different, with measurable differences between 
the DNA sequences of people from regions A, B, and 
C. However, the sequences of people from regions A, 
B, and C will be noticeably more similar to each other 
than they will be to the DNA sequences of people who 
live in other geographic regions and have descended 
from different population subgroups, such as the people 
in region D. A study of the DNA sequences from people 
in geographic regions A, B, C, and D would reveal a 
TMRCA for groups B and C that is considerably more 
recent than the TMRCA for groups A and D, B and D, or 
C and D (Figure 8.8). 


Any DNA sequence for which the mutation rate is known 
can serve as a molecular clock. Anthropologists determine 
the number of differences that exist between the DNA 
sequences of two groups and use the known mutation rate 
of the sequence to calculate the time to the most recent 
common ancestor (TMRCA) for those two groups. 


The Debate over the Origins of the 
Human Race 


The origin of the anatomically modern human (AMH), 
homo sapiens, is a subject of significant controversy, but 
there are also several points of agreement. Most experts 
agree that homo sapiens evolved from homo erectus, 
a human-like primate who walked upright on two feet. 
Most also agree that a wave of migration took homo 
erectus out of Africa and into Europe and Asia approxi- 
mately 2 million to 1 million years ago (2.0-1.0 mya). 
Finally, most authorities also agree that another wave 
of migration out of Africa took place about 100,000 
to 200,000 years ago. These recent African migrants 
encountered different populations of homo erectus as 
they migrated into Europe and Asia, and they coexisted 
in several locations. 

The debate concerning the evolution of homo erec- 
tus into the anatomically modern human (AMH) homo 
sapiens (Figure 8.9) involves three main competing 
theories: the uniregional model (recent African origin 
or African replacement), the multiregional model, and 
the assimilation model. The primary difference among 
the three theories involves whether the recent African 
migrants replaced the other hominids they encoun- 
tered without interbreeding with them or whether 
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FIGURE 8.9 Homo sapiens and Homo erectus. (A) Human being 
(Homo sapiens), male. (B) Artist’s rendering of Homo erectus, which 
lived from approximately 1,700,000 to 200,000 years and migrated 
out of Africa and into Europe and Asia approximately 2 to 1 million 
years ago (2.0-1.0 mya). 


there was significant interbreeding between the recent 
African migrants and other hominids. Scientists are 
using DNA analysis to determine whether the DNA of 
the human (AMH) homo sapiens is derived solely from 
the DNA of the most recent wave of African migrants 
or whether it includes evidence of breeding with the 
other hominids that the African migrants encountered 
as they migrated into Europe and Asia. 

Experts that support the uniregional theory think 
that the recent African migrants did not interbreed 
with the other hominids they encountered in Europe 
and Asia. Instead, these other hominid groups all 
became extinct, and the more recent African migrants 
replaced them (perhaps by being better able to sur- 
vive in the environment, perhaps by killing them) and 
became the sole forerunners of the modern human 
race. According to these theorists, the recent African 
migrants provided all of the DNA sequences from 
which the DNA genomes of the AMH homo sapiens 
were descended. 

The multiregional model proposes that the original 
homo erectus population migrated into Europe and 
Asia and then homo erectus evolved into homo sapiens 
simultaneously in several different geographic regions. 
This theory postulates that the multiregional evolution 
of homo sapiens involved significant interbreeding 
between the more recent African migrants and the dif- 
ferent hominid groups that they encountered, as well 
as interbreeding between groups that had populated 
different geographic regions. This hypothesis accounts 
for the idea that hominids in all geographic regions 
developed certain characteristics in order to survive 
and evolve into homo sapiens, including some physical 
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traits that were not essential for survival, such as facial 
features, which evolved differently in different geo- 
graphic regions. Extensive interbreeding between these 
different populations would preserve the gene muta- 
tions that allowed offspring to better adapt to the envi- 
ronment and thrive in all of the different geographic 
regions. In contrast, traits that are not essential to the 
survival of the individual but do affect the selection of a 
mate and survival of the species, such as facial features, 
probably evolved differently in the different geographic 
regions. This might explain why some of the cranial 
and facial features observed in fossilized specimens are 
also present in the people who live in the same geo- 
graphic region where the ancient fossil was discovered. 
For example, the robust cheekbones are seen in both 
the modern Australian aborigines and in the fossilized 
homo erectus specimens found in Southeast Asia. 

The assimilation model proposes that the recent 
African migrants did interbreed with some other 
hominid groups, but that the degree of interbreeding 
varied greatly from one geographic region to another 
and from one time period to another. This model can 
explain some of the contradictions that appear in the 
research literature. For example, it was firmly estab- 
lished that the Neanderthals of Europe did not con- 
tribute mtDNA to the pool from which the mtDNA 
of the human (AMH) homo sapiens had descended. 
However, several Australasia studies suggest that the 
ancient hominids who settled this area did contrib- 
ute some DNA sequences to the pool from which the 
DNA of the AMH homo sapiens descended. 

Three major models have been proposed to explain 
the evolution of the AMH homo sapiens. The earliest 
DNA studies supported the African replacement model, 
and most experts currently agree that this model is the 
most credible explanation of how the AMH homo 
sapiens evolved. When anthropologists study the DNA 
sequences of people living in Africa, Asia, and Europe, 
the vast majority of the oldest sequences are found 
in Africa. In addition, many studies report that mod- 
ern human DNA contains only DNA sequences that 
have descended from African migrants, but no DNA 
sequences that descended from the other hominid 
groups. 

One important example involves the study of mtDNA 
from the Neanderthal specimens that were found in 
Europe. It was once believed that the AMH evolved 
from the Neanderthal, and many textbook models 
depicted linear evolution in which the Neanderthal was 
the direct ancestor of the AMH. However, DNA stud- 
ies have shown that the sequence of the Neanderthal 
mtDNA is very different from that of the AMH mtDNA. 
These results indicate that the Neanderthals became 
extinct and were replaced by more recent African 
migrants, prompting theorists to suggest that, rather than 
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a linear model, human evolution may resemble a bush, 
with many different branches that grew but ended, and 
only one branch that continued on to found the AMH 
homo sapiens. 

Some researchers claim that the early studies that 
support the African replacement model do not provide 
a complete picture of the events because the early 
studies relied on analysis of mtDNA and Y chromo- 
some DNA sequences. The mtDNA is inherited solely 
from the mother, and the Y chromosome is inherited 
just from the father, so the mtDNA and Y chromosome 
DNAs cannot be used to analyze events as many years 
ago as is possible by analyzing autosomal chromosome 
DNA. Critics argue that the findings of the mtDNA and 
Y chromosome DNA studies must be confirmed with 
research on the autosomal chromosome sequences in 
order to provide the information necessary to resolve 
the debate. However, because nuclear genome DNA 
degrades much more rapidly than the mitochondrial 
DNA, it is difficult to accurately study nuclear DNA 
sequences from ancient specimens using current DNA 
testing technology. Even though some recently pub- 
lished studies support the African replacement model, 
others argue against it in favor of a model that postu- 
lates some level of interbreeding between the African 
migrants and the other hominids. 


Most experts support the model of African replacement, 
which concludes that little interbreeding took place between 
the African migrants and the other hominids in Europe and 
Asia, although other reports indicate that some degree of 
interbreeding occurred. Further studies on autosomal chro- 
mosome DNA sequences will help to resolve this debate. 


The Colonization of the Americas 


The theories that describe the colonization of the Americas 
must account for the presence of three cultural and lin- 
guistic groups: Amerinds, NaDene, and Eskimo-Aleut. 
Before the advent of DNA testing, many theorists believed 
that each of these three groups came to the Americas in 
its own independent wave of migration, but more recent 
DNA studies strongly contradict this theory. Researchers 
discovered four common (and one rare) versions of the 
mtDNA sequence in Native Americans. All four of the 
common mtDNA sequences can be found in people who 
descended from the Amerinds, but only one of the com- 
mon mtDNA sequences is found in the descendants of 
the NaDene, and only two mtDNA sequences are found 
in the descendants of the Eskimo-Aleuts. Most researchers 
conclude that the Americas were populated by a single 
wave of Amerind migrants from Northeast Asia and that 
the NaDene and Eskimo-Aleut peoples diverged from 
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the Amerind people after the Amerinds arrived in the 
Americas. Although the experts generally agree on these 
points, they disagree on the specific timing of the migra- 
tion, the size of the original founding population, and 
whether the founding population experienced extreme 
or mild bottlenecks as they went from one phase of the 
process to the next. A population bottleneck is an evolu- 
tionary event in which most of the individuals in a popu- 
lation are killed or are otherwise unable to reproduce. A 
bottleneck can increase inbreeding in the smaller pop- 
ulation as a result of a reduced pool of possible mates. 
Different migration dates have been proposed that range 
from 40kya to 13kya, and estimates of the size of the 
effective founding population range from 70 to 5000 
individuals. 

Most authorities believe that the forerunners of the 
Amerinds (proto-Amerinds) expanded from East Central 
Asia into Northeast Asia sometime before 50kya and 
experienced a period of gradual population growth 
around 43-36kya (Figure 8.10). A subgroup of the 
proto-Amerinds is thought to have migrated into the 
regions on either side of what is now the Bering Strait, 
which was then a landmass that connected Northeast 
Asia to North America until rising water created the 
Bering Strait approximately 11-10 kya. The region was 
productive grassland, home to many species of plants 
and mammals. At approximately 36-16 kya, the proto- 
Amerind population grew in size and genetic diversity, 
but ice sheets blocked access to the Americas until 
about 17-14kya when the ice sheets began to melt. 
Many believe that the Pacific coast may have been 
the first region to become passable, perhaps as early 
as 19kya, and that the first proto-Amerinds to migrate 
into the Americas traveled via a Pacific coastal route 
as soon as the retreating ice offered the opportunity. 
As the ice retreated even farther, several more inland 
routes appeared, and other proto-Amerinds migrated 
using these paths. These theories are supported by the 
fact that human settlements at the Monte Verde site on 
the coast of Chile are believed to be 14,500 years old, 
which predates the more inland settlements such as 
the Clovis complex in the southwestern United States 
by as much as 2400 years. Once these early settle- 
ments were established, the human population in the 
Americas expanded rapidly, both in size and in geo- 
graphic range, from approximately 16-9 kya. 


Most experts agree that the first Amerinds to settle in the 
Americas migrated along the Pacific coast about 19-17 kya 
and settled as far south as the coast of Chile. Other Amerinds 
migrated to the Americas shortly thereafter, using more 
inland routes that were exposed by the receding glaciers that 
once covered what is now Canada. 
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FIGURE 8.10 Ancestors of the Amerinds (proto-Amerinds) expanded 
from East Central Asia into Northeast Asia. This diagram shows the 
most likely routes and timing for the colonization of the Americas. 
The ancestors of the Amerinds (proto-Amerinds) expanded from East 
Central Asia into Northeast Asia sometime before 50kya and expe- 
rienced a period of gradual population growth around 43-36kya. 
Reprinted from Kitchen A, Miyamoto MM, Mulligan CJ (2008). 
A Three-Stage Colonization Model for the Peopling of the Americas. 
PLoS ONE 3(2):e1596. doi:10.1371/journal.pone.0001596. 


SUMMARY 


Each individual has a genome DNA sequence that is 
unique, so that when someone leaves biological mate- 
rial at a crime scene, it is as if the person had left a call- 
ing card containing his or her name. Similarly, because 
we pass our chromosome DNA sequences to our chil- 
dren by inheritance, every child bears DNA sequences 
that unmistakably match DNA from both biological 
parents. When DNA testing is used to determine pater- 
nity, the analyst determines the DNA profile of the 
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child and the mother, then subtracts the alleles that the 
child inherited from the mother from the child’s DNA 
profile, which reveals the alleles that the child inher- 
ited from the father. To identify the source of crime 
scene evidence containing DNA, investigators look for 
someone whose DNA profile matches the profile of the 
DNA evidence. Everyone involved in this process must 
remain unbiased, however, because a DNA profile 
match by itself is not proof of guilt; it is one more piece 
of evidence for the jury to weigh. 

The probability of a coincidental DNA match 
between the defendant and the evidence is referred 
to as the random match probability (RMP). This is the 
mathematical probability that a person who was ran- 
domly selected from the defendant's racial/ethnic 
group would have a DNA profile that would match 
the defendant. The higher the RMP, the less strongly 
the DNA match implicates the defendant as the source 
of the evidence. If the DNA in the sample has not 
degraded, forensic DNA analysts can usually get data 
from enough genome markers to produce an infinitesi- 
mal RMP, sometimes as low as 1 in 1 quintillion—way 
beyond reasonable doubt. To calculate the RMP for the 
DNA profiles for the suspect and the evidence, analysts 
must determine the frequency of the marker genotypes 
in the suspect’s DNA and in the general population. 
Law enforcement agencies have collected DNA pro- 
files from many people who are not criminals but who 
have volunteered to give their DNA information to 
build better DNA databases so that scientists and law- 
yers can determine the most accurate frequencies of 
common marker alleles and genotypes in the general 
population. 

DNA testing has also proven to be a highly useful 
tool for investigating the evolution and migration of 
humans. Although most experts currently support the 
African replacement model, many also believe that lit- 
tle to no interbreeding occurred between the African 
migrants and the other European and Asian hominids. 
Some research points out flaws in the earlier stud- 
ies in the light of findings that indicate traces of DNA 
sequences from some of these other hominid groups 
with modern human DNA. 

Although there is still some debate regarding the tim- 
ing of migration and the size of the founding population, 
DNA testing appears to have settled the main points of 
the debate regarding whether the Americas were settled 
by three peoples who migrated to the Americas inde- 
pendently of each other or whether a single wave of 
migrants gave rise to the three distinct cultural/linguis- 
tic groups believed to have been the Americas’ earliest 
settlers. The DNA data suggest that the Americas were 
originally settled by Amerinds and that the NaDene and 
Eskimo-Aleut peoples evolved from the Amerinds. The 
Americas’ first settlers appear to have traveled along a 
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Pacific coastal route, whereas later settlers took advan- 
tage of the more inland routes, which developed as the 
glaciers that covered present-day Canada receded. 

DNA testing can be performed on any life form, 
including recent as well as ancient specimens, and it 
has become a highly versatile and powerful molecu- 
lar tool. DNA testing is useful for investigating violent 
crimes, not only those involving humans but also acts 
of poaching. DNA studies can help identify military 
casualties and victims of mass disasters and can iden- 
tify the strains of microorganisms used in bioterror 
attacks. DNA testing has made it routine to use DNA to 
trace family lineages by genealogy and to identify the 
specific bacteria or viruses that cause various diseases. 
DNA has also offered scientists the power to reach far 
back in time and recreate the evolutionary events that 
created the diverse modern human population. DNA 
has a universal significance to all life forms as the mol- 
ecule that carries genes, and as a result DNA testing 
can be applied to answer a variety of questions. 


REVIEW 


This chapter focuses on two major areas of DNA appli- 
cations in forensics. The first is the traditional DNA 
forensics used in law enforcement and by the crimi- 
nal justice system. In addition, many examples were 
presented to demonstrate how forensic DNA testing is 
used in many fields, including paternity testing, iden- 
tifying the perpetrators of violent crimes, and tracing 
the spread of infectious diseases. The second area of 
focus explores the use of DNA testing applied to the 
ongoing debate about the origins of humans and the 
colonization of the Americas. To test your understand- 
ing of these concepts and issues, answer the following 
review questions: 


1. Describe the relationship between the following 
terms: “genotype,” “DNA profile,” “allele,” and 
“polymorphic marker.” 

2. What is meant by the term “polymorphic tetra- 
nucleotide repeat”? 

3. Imagine you are a DNA analyst assigned to a 
paternity case. You test the mother and child and 
produce the following genotypes: 


Mother Child 
Marker 1 7,9 9,11 
Marker 2 27,31 25,31 
Marker 3 12,19 16,19 
Marker 4 6,12 3,6 
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10. 


What can you conclude about the father’s DNA 
profile? 

Imagine you are a forensic DNA analyst assigned 
to a murder case in which there is a match 
between the DNA profile of the evidence and 
that of the suspect. Why is it important for you 

to choose a reference database that has DNA 
profiles from people whose racial/ethnic heritage 
matches that of the defendant? Under what cir- 
cumstances would the use of the wrong database 
prejudice the calculation against the defendant 
versus tilting the calculation in the defendant's 
favor? 

Imagine you are a defense attorney and you are 
defending a murder suspect whose DNA profile 
matches that of some hairs that were found on 
the victim’s shirt. How many different alternative 
explanations can you give for the presence of 
your client's hair at the crime scene, other than 
the explanation that your client committed the 
murder? 

Describe one other application of forensic DNA 
testing, apart from the investigation of violent 
crimes. 

Describe the limitations and advantages inherent 
in using mtDNA versus nDNA for forensic and 
anthropological investigations. 

What three hypotheses have been offered as 
possible descriptions of the way in which the 
anatomically modern human arose on the Earth, 
and which of these three hypotheses do most 
authorities currently support? 

What is the most important point of debate 
between authorities regarding the DNA pool from 
which the modern human DNA has descended? 
How has the DNA evidence helped settle the 
debate as to whether the NaDene and Eskimo- 
Aleut peoples evolved from the Amerind people 
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who first settled the Americas or whether all three 
groups coexisted in Northeast Asia and migrated 
independently to the Americas? 
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“Patient Leads Fight for His Life” 


Josh Sommer: College Student and Cancer Researcher 


“This is not an academic exercise at all. It is a matter of 
life and death’—Josh Sommer (Todayshow.com, February 


2009). 
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Josh Sommer knows a lot about patience from working 
in a research lab; experiments cannot be rushed. Sommer’s 
work is a small but vital part of an international effort to 
understand the genetic mechanisms behind cancer cells. So 
Sommer is very patient; he knows a mistake could set the 
project back a week, and thousands of lives are at stake, 
including his. In 2006, Sommer was a freshman at Duke 
University when an MRI showed a tumor pressing on his 
brain stem and growing around major arteries in his brain. 
Sommer’s cancer is very rare and is known as an orphan 
disease because only about 300 people are diagnosed 
with chordoma tumors every year. Usually malignant, 
these tumors grow slowly in the spine or at the base of the 
skull and can spread to other organs. Chordoma tumors 
originate in the cells of the embryonic notochord, which is 
normally replaced by the bones of the spine early in fetal 
development (see Chapter 12). There are no effective treat- 
ments for chordoma, with a life expectancy of only seven 
years after diagnosis. Sommer and his mother were devas- 
tated by this horrible news, but they vowed not to give up. 
Sommer had the tumor removed and during his recovery, 
Sommer began to learn as much as possible about chor- 
doma cancer, but there wasn’t much to find. Like many 
other orphan diseases, rare cancers like chordomas are left 
out of the spotlight and out of the research funding system. 

Then Sommer discovered that one of the world’s lead- 
ing chordoma researchers, Dr. Michael Kelley (MD), 
worked at Duke. Sommer met with Kelley and asked what 
he could do to help support chordoma cancer research. 
The next week, Sommer began doing research in Kelley's 
lab, searching for the genes that cause chordoma cancer 
(Figure 9.1). Sommer learned that one major obstacle to 
curing cancer was the lack of chordoma cancer cell lines, 
but unfortunately the tumors needed to generate the can- 
cer cell lines were routinely discarded after surgery, and 
chordoma research was not well funded. 

In 2007, Sommer and his mother began the Chordoma 
Foundation, created a chordoma database, and began to 
build an effective international scientific team to study the 
rare cancer. Sommer knows that his best chance for suc- 
cessful treatment requires a coordinated team effort by 
“scientists and doctors working hand-in-hand with patients 
toward this common goal.” Sommer said, “I view every 
chordoma patient, family member, doctor and research as 
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(A) 
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FIGURE 9.1 Josh Sommer: College student researches his own 
type of cancer. (A) Josh Sommer (left) with his research advisor 
at Duke University, Dr. Michael Kelley (right). (B) Tumors can 
arise from tissues of the nose [1], sinuses (ethmoid sinuses [2], 
maxillary sinuses [3]), and the base of the skull [4]. 


teammates in the search for a cure.... Working together, we 
can turn our dreams for a cure into reality.” 

Josh Sommer is one of the first people to work in 
a research lab on his own disease, but he is not alone. 
Cystic fibrosis patient Jeff Pinard worked on the genetics 
of his disease, and Tulane medical student Andy Martin 
studied a cancer even rarer than chordoma called sinona- 
sal undifferentiated carcinoma, which claimed his life in 
2004. Sommer cautions people not to be distracted from 
the urgent reality of the lives that hang in the balance. “For 
me, this is a high-stakes race to outrun my disease,” said 
Sommer, “I guess the way | look at it is that there will be 
a time for every disease when one can in essence outrun 
their disease.... For Andy his disease was too fast and the 
science too slow.” 
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LOOKING AHEAD 


The human body is made up of trillions of tiny cells that 
work together to keep the body alive; each cell must 
also have the ability to act in ways that are completely 
independent of the other cells. In fact, each cell has 
the ability to commit to grow by cell division (mitosis) 
and make more cells of the same type or to commit to 
cell suicide (apoptosis), a normal process that rids the 
human body of unneeded cells. Both processes involve 
individual cells, but in the end they both benefit the 
entire cell population. These processes reflect steps 
toward a longer-term goal such as the development of 
fingers in the human embryo; the cells that make up 
the webbing between the digits die off as a result of 
apoptosis. Cells follow different developmental path- 
ways as a result of interpreting the signals transmitted 
between cells. Cancer cells are of particular interest 
because they fail to follow the rules that control the 
cell cycle and begin to grow out of control. To under- 
stand these different cell fate decisions and responses, 
we need to know how genes and proteins control the 
eukaryotic cell cycle and what happens to trigger the 
development of deadly cancer cells. 

On completing the chapter, you should be able to 
do the following: 


e Understand how protein receptors on the 
surfaces of certain cells can participate in deciding 
cell fate. 

e Explain the important differences between mitosis 
and meiosis in terms of maintaining or changing 
the total number of chromosomes in the cell. 

e Explain how the four stages of the eukaryotic cell 
cycle are related to the part of the cell cycle called 
interphase. 

e Describe how the checkpoint feedback system is 
used to avoid creating potential cancer cells carry- 
ing unstable genomes. 

e Explain why cell suicide (apoptosis) is an important 
process for healthy cells even though it sounds like 
a strange choice for an individual cell to make. 

e Understand that tumor cells need a blood supply to 
grow, and explain how scientists have taken advan- 
tage of this feature of cancer cells to develop poten- 
tial anticancer drugs. 

e Describe the types of genes and proteins that directly 
cause cancer cells to develop. 


INTRODUCTION 


The cells in the human body do not exist alone, but 
are members of a network of diverse communities of 
cells that make up the different tissues and organs in 
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the body. Some organs are made up of one type of spe- 
cialized cells, whereas other organs contain different 
types of cells; both cell types contribute to the overall 
functions of the organs. For example, the kidney and 
heart contain different cell types and have different 
functions. The pancreas contains several different types 
of cells with different specialized functions, including 
the islet cells that produce and secrete insulin into the 
bloodstream. This chapter focuses on the biological 
processes that determine the developmental “fates” of 
different cells during the life span of the organism. The 
chapter explores the biochemical processes that cause 
cells to change fate, usually by altering the expression 
of certain genes that determine which pathways the 
cell will follow: 


e Fate 1: cell division (reproduction to make more cells 
identical to the parent cells) 

e Fate 2: cell differentiation (specialization changes cell 
structure and function) 

e Fate 3: cell death (apoptosis to rid the body of 
selected cells) 


It could be argued that fate 4 is the prolonged qui- 
escent state adopted by highly specialized cells such 
as memory T and B cells and egg cells in females, but 
we will focus on the three major cell pathways. Cells 
in the human body are constantly lost and replaced. 
During a lifetime, about 100 trillion cells in the human 
body will age, die, and be replaced. The epithelial 
cells lining the intestines routinely slough off and are 
replaced with new epithelial cells. The human body 
normally changes all of its skin about once every 
month, shedding 30,000 dead skin cells every minute! 
Cell renewal is necessary so that the tissues and organs 
continue to function. The blood cells in the human body 
are replaced continuously, including the red blood 
cells that carry oxygen to the body tissues. The body's 
immune system must also routinely replace white blood 
cells (B and T lymphocytes), which have a very short life 
span. These immune system cells are replaced by the 
rapidly dividing blood precursor stem cells located in 
the marrow of the long bones. 

Whether present in the fetus before birth or years 
later in the adult human body, cells routinely change 
developmental pathways in response to many different 
signals received from the environment and to interac- 
tions with neighboring cells. Cells also interpret infor- 
mation from distant parts of the body such as the brain. 
Signals can come in the form of protein hormones that 
are released into the circulating blood, an efficient 
process because the blood comes into contact with all 
cells and tissues. Hormones and other circulating fac- 
tors can prompt cells to change gene expression pat- 
terns, which in turn causes cells to begin to develop 
into specialized cells (Figure 9.2). 
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FIGURE 9.2 Cell division can have more than one outcome. 
(A) Cell division by mitosis (reproduction to make more of the same 
types of cells). (B) Cell differentiation (cell specialization to change 
cell structure and function). (C) Cell death (apoptosis or programmed 
cell death to rid the body of unwanted cells). 


Every cell in an individual human body contains 
identical DNA genomes regardless of cell type (except 
for red blood cells, sperm, and egg cells). It is not the 
differences in DNA sequences that allow specialized 
cells to perform different jobs in the human body. The 
220 types of different cells in an individual’s body all 
contain exactly the same genome DNA sequences. But 
different cells in the body do express different genes 
at different times. This fundamental concept explains 
how the cells with identical genomes can perform 
such diverse functions in the human body. Differential 
gene expression permits lung cells to absorb oxygen, 
stomach cells to express digestive enzymes, and brain 
cells to send nerve impulses; even though all of these 
cells carry identical DNA genomes, the specialized 
functions are dictated by the specific genes expressed 
in the individual cells involved. In human cells, this 
means that at any one time a myriad of different genes 
in the human genome are being turned on and off in 
response to complex external and internal cellular sig- 
nals. These cellular events trigger many of the changes 
that occur during a human lifetime, from removing the 
skin cells between the fingers of a developing embryo 
to the explosion of sex hormones in the teen years and 
the slow growth of a cancerous tumor later in life. 

This chapter focuses on cell fates, including cell 
division (reproduction) to make more body cells (or 
gametes), cell specialization to change the function 
performed by the cell, and programmed cell death 
(apoptosis) to rid the body of unneeded cells. 


FATE 1: CELL DIVISION AND 
REPRODUCTION 


Healing a wound is a common human experience that 
offers a good opportunity to observe how the body 
responds to trauma involving the top layers of skin 
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FIGURE 9.3 The top layers of human skin cells are dead or dying. (A) Microscopic image of stained human skin cells (cross section). 


(B) Diagram indicating the different layers of the human skin. 
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FIGURE 9.4 Macrophage cells engulf bacteria by phagocytosis. (A) A human macrophage cell (pink) devours the much smaller bacteria 
(yellow rods) on the outer surface of a blood vessel. (B) A macrophage cell extends a long, thin projection to capture a nearby bacterial cell. 


cells (Figure 9.3). The healing process starts when the 
undamaged cells near the site of the injury multiply 
by mitosis to replace the destroyed cells. The pur- 
pose of mitotic reproduction is to make new cells that 
are genetically identical to the parent cells, contain 
the same number of chromosomes, and can perform 
the same functions in the body. In the skin the process 
of regeneration involves three steps called the inflam- 
matory, proliferation, and remodeling phases. Wound 
cleanup starts during the inflammatory phase, when 
cellular debris is engulfed by macrophage cells 
and eliminated by phagocytosis (Figure 9.4). Then 
signal proteins are released that instruct the cells 
near the edge of the wound to begin mitosis (cell 
division) and produce replacement cells. Cells migrate 
to new positions in the wound area during the remod- 
eling phase, which is characterized by the growth 
of new blood vessels. The fibroblast cells secrete 


collagen and fibronectin into the wound area to build 
a new extracellular matrix (ECM) around the outside 
of the cells. Epithelial cells grow over the wound, 
myofibroblasts cause the wound site to contract, and 
collagen deposits are remodeled and realigned, all 
with the goal of healing the wound without forming 
scar tissue. Finally, programmed cell death (apoptosis) 
is activated in selected cells causing those specific 
cells to die, removing unnecessary and damaged cells 
left after cellular remodeling is complete. 

The healing characteristics of skin cells are amaz- 
ing, but the fact is that most organs and tissues in the 
human body do not respond to injury or disease by 
cellular regeneration. A well-known example is injury 
to the spinal cord. Even though some damaged nerve 
cells can regenerate, the nerves in the spine cannot 
regenerate a spinal cord, so the spinal cord damage 
remains and some level of paralysis is usually the result. 
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The human liver is unusual because it is one of the few 
organs in the body that can totally regenerate. Injury 
to the liver triggers the liver cells to undergo mitotic 
cell division to produce more liver cells that repair the 
damage to the liver. 


Chromosome Number is Controlled 
by Cell Division 


Chromosome number (ploidy) is a key feature of every 
eukaryotic cell and it is carefully controlled during cell 
division. Most of the cells in the human body are dip- 
loid cells containing two copies of each chromosome 
(46 total) per cell. (see Chapter 6). Each haploid sperm 
or egg cell carries 23 chromosomes and contributes 
one human genome to the fertilized egg, producing 
a diploid zygote with 46 chromosomes (Figure 9.5). 
During each cell cycle, the entire DNA genome must 
be replicated and the duplicated chromosomes must 
be correctly transferred to the offspring cells. A high- 
fidelity chromosome segregation process is essential 
for the correct chromosomes (and genes) to be inher- 
ited by the progeny cells. A defect in this process of 
chromosome segregation can cause cells to inherit 
the wrong chromosomes (and genes) when the cells 
divide. Before a diploid cell can undergo cell division 
to make two new cells, the DNA in 46 chromosomes 
must be duplicated and packaged into 92 chromo- 
somes, which are then segregated during cell division 
to produce two progeny cells with 46 chromosomes 
each. When cells divide by mitosis, chromosome 
number (ploidy) is maintained (Figure 9.6). When cells 
divide by meiosis, chromosome number is decreased 
by half (Figure 9.7). 

Although eukaryotic cells are extremely diverse 
in form, function, and lifestyle, all eukaryotic cells 
reproduce by following the same general life plan. It 
is essential to understand the basic steps in cell divi- 
sion to be able to appreciate what occurs when 
cancer cells grow out of control. The eukaryotic cell 
division cycle contains four main stages, which are 
always executed in the same order: G1, S, G2, and M 
(Figure 9.8). Each stage of the cell cycle is dedicated to 
completing a series of specific events that must occur 
as the cell transits through each successive stage of 
the cell cycle. The overall plan makes sense: the cell 
must replicate its DNA (duplicate the chromosomes) 
before the cell can proceed to divide into two cells, 
with each progeny cell inheriting an equal number of 
chromosomes from the parent cell. Briefly, cells in G1 
are preparing for DNA replication in S phase (the DNA 
synthesis phase). Cells in G2 have already duplicated 
their chromosomes and are preparing to segregate 
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FIGURE 9.5 The human life cycle. Meiotic cell divisions produce 
egg or sperm cells in ovaries and testes, respectively. Each sperm 
cell carries one copy of each chromosome; there are 23 chromo- 
somes in each haploid germ cell (1n). When an egg is fertilized by a 
sperm cell, the result is a diploid zygote containing 46 chromosomes 
(2 copies of each chromosome; 2n). (see Chapters 6 and 10) 
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FIGURE 9.6 Mitotic cell division maintains chromosome number. 
In mitosis and meiosis, the DNA in the chromosomes replicates and 
the duplicated chromosomes line up in metaphase to be distributed 
correctly to the progeny cells. Mitotic cell division maintains the 
parental number of chromosomes. 


the chromosomes to the progeny cells in mitosis (M). 
Typically cells spend most (95%) time in G1, G2, and 
S phases (interphase). Mitosis (M), the part of the cell 
cycle where the important process of chromosome 
segregation (the actual molecular inheritance of genes) 
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FIGURE 9.7 Meiotic cell division reduces chromosome number. In mitosis and meiosis, the duplicated chromosomes line up in metaphase 
to be distributed correctly to the progeny cells at the end of the first of two nuclear divisions (meiosis | and meiosis II). The final cell products 
of meiosis are four haploid daughter cells, each cell containing one-half of the parental number of chromosomes (23 chromosomes in each 
human haploid cell). Meiotic cell division reduces the chromosome number from diploid (2n) to haploid (1n) cells. 


occurs, is often the shortest part of a cell’s life cycle 
(see Figure 9.8). 

Cells in the G1 stage of the cell cycle are metaboli- 
cally active, but DNA replication does not start until the 
cells transit into S phase. After the chromosomes dupli- 
cate, the cell enters G2 and continues to grow in size, 
synthesizing many different proteins in preparation for 
mitosis. For every cell division, the cell assembles an 
apparatus of special fibers (microtubules) that function 
to properly move (segregate) the duplicated chromo- 
somes when cells divide in mitosis and meiosis. Once 
mitosis is finished, the process of cytokinesis physically 
creates the two progeny cells and the cell cycle is com- 
plete. The future health of any cell depends on inheriting 
a complete genome, so the process of chromosome seg- 
regation during cell division is arguably the most impor- 
tant stage in the cell cycle. Nonetheless a successful 
cell cycle also depends on the cell’s ability to complete 
each stage of the cell cycle. 

To control each cycle, eukaryotic cells use a check- 
point surveillance system to obtain feedback informa- 
tion about the progress of each stage of the cell cycle 
(Figure 9.9). The checkpoint system is an important pro- 
tection against the formation of dangerous cancer cells. 
The checkpoint system significantly reduces the number 
of potential cancer cells with abnormal genomes that 
escape the safeguards provided by the genome repair 
and apoptosis pathways. 


Division 
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FIGURE 9.8 The eukaryotic cell cycle consists of two unequal 
parts, mitosis (M) and interphase (I). The four stages of interphase 
always occur in the same order: G1, S, G2, M (clockwise). During 
interphase (G1, S, and G2) the chromosome DNA is replicated 
and the chromosomes are duplicated. During mitosis the dupli- 
cated chromosomes are segregated (moved) in the dividing cell 
so that each daughter cell inherits one copy of each chromosome. 
G1: gap before S phase, S: DNA replication, G2: gap after S phase, 
M: mitosis. 
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FIGURE 9.9 The eukaryotic cell cycle is controlled by checkpoint 
genes. The cell cycle feedback system monitors the key transition 
points in the cell cycle, such as when a cell passes from G1 into S 
phase (called the G1/S transition; restriction point or “Start”) or from 
G2 into M phase (G2/M transition). 


CANCER CELLS GO TO THE “DARK SIDE” 
AND EVADE CELL CYCLE CONTROL 


The checkpoint surveillance system is a biochemical 
feedback mechanism used by eukaryotic cells to moni- 
tor the progress of each cell cycle. As the cell finishes 
each stage of the cycle, the checkpoint feedback sys- 
tem signals that it is safe for the cell to proceed to the 
next stage of the cycle. For example, if a cell does not 
finish replicating its chromosome DNA during S phase, 
it is dangerous for the cell to start mitosis (M phase) 
with partly replicated genome DNA. In this situation 
the checkpoint system detects the incomplete replica- 
tion and sends a signal to temporarily delay the cell 
cycle at the S/G2 transition point until DNA replication 
is complete (Figure 9.9). The cell cycle delay is only a 
temporary solution however, because with time the cell 
will overcome the checkpoint delay and continue to 
divide. Cells that proceed with mitosis before genome 
replication is complete risk significant genome damage 
including the accumulation of multiple mutations and 
abnormal chromosome distribution during mitosis. An 
“unstable” genome can cause several forms of human 
cancer. Inheriting the wrong number of chromosomes, 
a condition called aneuploidy, is often lethal for cells 
missing one or more chromosomes. 

A relatively small number of human genes encode 
proteins with important roles in the prevention, initia- 
tion, and progression of cancer. In many types of can- 
cer, the genes that normally regulate the cell cycle are 
mutated, causing the cells to grow out of control. Proto- 
oncogenes are usually harmless genes in the human 
genome, but occasionally the accumulation of multiple 
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DNA mutations will convert harmless proto-oncogenes 
into dangerous oncogene and mutant tumor suppressor 
genes (Figure 9.10). 


The cell cycle requires a series of highly regulated events 
that depend on the proper expression of specific cell divi- 
sion cycle proteins. Gene mutations can destroy normal 
cell cycle control and cause checkpoint failure. Without 
proper cell cycle surveillance, the cells begin to grow rap- 
idly without control, the key hallmark of cancer cells. 


Human genome DNA is wrapped by special proteins 
into chromosome packages that are further protected 
by residing inside the double-membrane cell nucleus. 
Despite these protections, human genome DNA can be 
easily damaged by environmental factors such as x-rays, 
ultraviolet radiation from the sun, physical trauma, high 
heat, and mutations can also accumulate due to DNA 
mistakes made during replication. A damaged genome 
might contain nicks and breaks in the DNA helix, creat- 
ing regions of chromosome DNA that cannot be accu- 
rately copied and replicated during S phase or properly 
segregated in M phase, mitosis. The consequences of 
these damaging events can have a devastating impact 
on the cells and the organism (Figure 9.11). 

Cells that divide while carrying damaged chro- 
mosome DNA are vulnerable to lethal events involv- 
ing chromosome instability, including inheriting the 
wrong number of chromosomes or suffer broken or 
rearranged chromosomes. Multiple mutations leading 
to genome instability are key steps in the formation of 
cancer cells. 


Cell Division Cycle Genes Control 
Cell Growth 


The biochemical feedback mechanism that controls the 
progression of the cell cycle is essential to avoid the 
devastating consequences that occur when cells 
attempt to divide with damaged genomes. The cell 
cycle feedback system monitors the status of the 
entire genome and identifies DNA damage and other 
genome abnormalities. In the event that DNA damage 
is detected, a feedback signal is sent that causes the 
cell to delay the forward progress of the cycle, giving 
the cell time to repair the DNA damage or to complete 
DNA replication. This feedback system is an important 
protection against the formation of cancer cells 
because it is an effective way for out-of-control cells 
to be identified and destroyed, sometimes by inducing 
programmed cell death (apoptosis) in the cancer cells. 
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FIGURE 9.10 Multiple genome mutations can cause cancer cell development. When the cells in a tissue or organ suffer genome damage, 


the accumulation of genetic errors can cause cancer cells to develop. 


FIGURE 9.11 Cancer cells can develop from different types of cells. Cancer cells are shown. (Left to right) Top: breast, lung, neuroblastoma 
[microtubules (purple), actin (green)] Bottom: Sarcoma (nucleus is orange), prostate cancer cells. 
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FIGURE 9.12 Budding yeast cell cycle. The cell cycle of the budding yeast Saccharomyces cerevisiae proceeds from left to right. A single 
yeast cell at left has a nucleus and is beginning to make a bud. DNA replication (S) takes place followed by mitosis (M) to distribute chromo- 
somes (and nuclei) to the new cell (growing bud), which separates from parent cell by cytokinesis. 


The cell cycle feedback system monitors the key 
transition points in the cell cycle, when a cell passes 
from G1 into S phase (called the G1/S transition) or 
from G2 into M phase (G2/M transition). Special pro- 
teins search the genome for DNA damage and report 
on the status of the genome to the cell cycle control 
system, which is interpreted by the cell before the cell 
cycle is allowed to continue. In cases where the cell 
senses damage to the genome DNA at the G1/S tran- 
sition, the DNA must be repaired before the genome 
can finish replication. Similarly, if the cell senses 
genome damage at the G2/M transition, the cell must 
repair the DNA before the chromosomes can be safely 
segregated during mitosis. 

When the cell receives a “DNA damage” signal at 
either the G1/S or G2/M transitions, the checkpoint 
system responds by delaying the progress of the cell 
cycle, giving the DNA repair enzymes time to correct 
the damage or the DNA polymerase time to finish rep- 
licating the chromosome DNA. The cell cycle delay 
is designed to give the cell time to successfully cor- 
rect the genome defects. In cases where the cell fails 
to send or receive the DNA damage signals, continued 
uncontrolled growth causes serious consequences such 
as genome instability, chromosome loss, and chromo- 
some rearrangements. The failure of the cell checkpoint 
system is a characteristic feature exhibited by all can- 
cer cells (see Figure 9.9). 

The cell division cycle (CDC) genes that control 
the eukaryotic cell cycle were identified using genetic 
approaches in budding yeast (Figure 9.12). These single- 
celled eukaryotes use a relatively simple version of 
the same cell cycle control mechanism that operates 
in multicellular animals and mammals. In the 1980s, 
genetic studies on yeast helped scientists to understand 
how the key CDC genes and proteins control the cell 
cycle and execute a cell cycle checkpoint mechanism 
that is common to all eukaryotic cells. 


The eukaryotic cell cycle is controlled by a series 
of biochemical reactions involving cell division cycle 
proteins that trigger, coordinate, and carry out the 
key events in the cell cycle. The cell cycle plays a key 
role in cancer development; loss of cell cycle control 
is a defect common to cancer cells. In normal cells, 
the feedback mechanisms work efficiently because 
components of the cell cycle machinery interact with 
cell cycle regulatory proteins in a highly interconnected 
network that continually responds to internal and exter- 
nal signals. The cell cycle is controlled by proteins that 
perform cell surveillance and transmit signals informing 
on the status of the genome each time the cell transi- 
tions from one stage of the cell cycle to the next. 

The main checkpoints occur when the cell tran- 
sitions between stages, such as between G1 and S 
(G1/S) and between G2 and M (G2/M). At the check- 
points, the cell gathers information and responds by 
regulating the progress of the cycle (see Figure 9.9). 
For example, at the G1/S checkpoint, also called the 
“restriction point” in mammalian cells and “Start” in 
budding yeast, the cell can either continue to grow in 
overall size or it can commit to a cell division path- 
way, depending on factors such as nutrient availability 
and lack of genome damage. If conditions at the G1/S 
checkpoint are satisfactory, then the cell division cycle 
will continue. At the G1/S transition, however, some 
cells have the option to exit the cell cycle and enter 
a nondividing (or resting) cell state called GO. Some 
highly developed cells in GO can exist in a quies- 
cent state in the human body for long periods of time. 
At the G2/M checkpoint, the cell must confirm that the 
chromosomes had replicated properly in S phase and 
are ready for the critically important step of chromosome 
segregation in mitosis. Incomplete chromosome dupli- 
cation or mistakes in DNA replication cause genome 
damage that can trigger a cell cycle delay at the G2/ 
M checkpoint. This system also monitors molecular 
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events at the centromere checkpoint, confirming that 
each chromosome is physically attached to a spindle 
fiber before mitosis begins. 


CELL-CYCLE MACHINE: CYCLINS AND 
CYCLIN-DEPENDENT KINASES PROMOTE 
MITOSIS 


Cell cycle control requires the activities of many impor- 
tant proteins with special functions. Two key types of 
proteins, the cyclins and cyclin-dependent kinases 
(Cdk), play essential roles in cell cycle control and 
cancer prevention (suppression) (Figure 9.13). Cyclins 
are cell-cycle regulator proteins named because the 
number of cyclin proteins present in the cell cycles, 
increasing and decreasing as the cell cycle progresses 
(Figure 9.14). As the concentration of the cyclin pro- 
teins increases in the cell, cyclin binds to Cdk and 
makes a cyclin-Cdk complex called MPF (maturation 
promoting factor or M-phase promoting factor). MPF 
is an active protein kinase enzyme complex, which 
regulates the activity of many key cellular proteins by 
adding or removing phosphates from target proteins. 
The cyclic nature of the cell cycle reflects periodic 
fluctuations in the cellular concentrations of cyclin 
proteins and the enzymatic activities of MPF (Cdk-cyc- 
lin) complexes, which determine the timing of the suc- 
cessive events of the cell cycle (Figure 9.14). 

The term “cancer” actually represents many different, 
closely related diseases that can affect almost any type 
of human cell. Genetic studies indicate that no two can- 
cers are exactly the same, even when the same types of 
cells develop into tumors. In most cases, cancers form as 
a result of multiple mutations in a cell’s DNA genome, 
which cause defects that interfere with important bio- 
chemical pathways and protein functions in the cell. 
Acquiring multiple genetic mutations in the cell’s genome 
is a key step in the development of cancer. Understanding 
the biochemical steps involved in controlling the cell 
division cycle is critical to understanding how cancer 
develops. Cancer cells undergo profound changes in 
gene expression to overcome the controls that normally 
restrict cell division, allowing the cancer cells to repro- 
duce without limit and eventually migrate to other places 
in the body to establish new tumors (metastasis). 


Early Cancers Need Nutrients 


Cells in the human body need a constant supply of oxy- 
gen and nutrients delivered by the circulatory system. 
The tissues are full of many tiny blood vessels called 
capillaries that come in close contact with most cells. 
When the cancer cells are dividing rapidly, they 
put additional demands on the supplies of oxygen 
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and nutrients. The oxygen-starved tissues release signal 
molecules to promote angiogenesis, the growth of 
new blood vessels (Figure 9.15). The observation by 
Dr. Judah Folkman that cancer cells require access to 


FIGURE 9.13 The eukaryotic cell cycle is controlled by cyclin pro- 
teins, Cdk enzymes, and (Cdk-cyclin) kinase (MPF). (1) Cyclin proteins 
are synthesized during the G1, S, and G2 phases and accumulate in 
the cell. (2) A critical concentration of cyclin proteins is reached at 
the transition from G2 to M phase, called the G2/M checkpoint (red 
bar). (3) At this point in the cell cycle the cyclin proteins bind to the 
inactive Cdk proteins to make active MPF (Cdk-cyclin) kinase. MPF 
is an enzyme that adds phosphates to key cell cycle control proteins. 
(4) This triggers a signal cascade mechanism that permits the cell to 
transit the G2/M checkpoint and begin mitosis (M). (4) Once mito- 
sis is underway, the activity of the MPF (Cdk-cyclin) kinase enzyme 
is turned off when the cyclin proteins bound to Cdk are degraded. 
(5) In the beginning of the next cell cycle, newly synthesized cyclin 
proteins start to accumulate and bind to the Cdk proteins to form 
active MPF kinase enzymes at the G2/M checkpoint. 
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FIGURE 9.14 The number of cyclin proteins changes during the 
cell cycle. (1) The concentration of cyclin proteins increases dur- 
ing G1 and S phases. (2) At the G2/M transition the abundant cyclin 
proteins bind to inactive Cdk proteins to make activate MPF kinase 
(Cdk-cyclin) enzymes. (3) At the metaphase-anaphase transition 
point in mitosis, the chromosomes segregate and the cyclin proteins 
are rapidly degraded, which inactivates the MPF kinase enzyme. 
(4) In the next cell cycle, the process begins again. At the G2/M 
transition, the cyclins and Cdk proteins form the active MPF kinase 
enzymes needed to initiate mitosis. 
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a blood supply was an important discovery because it 
raised the idea that anticancer drugs could be devel- 
oped that would block angiogenesis and inhibit cancer 
cell growth. 


Metastasis is Inefficient but Deadly 


The early diagnosis and treatment of cancer is often 
the best chance for a complete cure for most can- 
cer patients; early cancer detection is essential. The 
majority of cancer deaths are caused by cells that have 
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metastasized; these cancer cells are released from 
a primary tumor and travel through the blood and 
lymph system to begin new tumors at distant parts of 
the body. The much less dangerous benign tumors, 
also called neoplasms, grow in only one location in 
the body and do not metastasize. Benign tumors can 
cause illness and even death if they interfere with the 
functions of vital organs such as the brain. However, 
benign tumors are almost never as lethal as malignant 
tumors that have spread from a primary site to another 
location. 
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FIGURE 9.15 The growth of new blood vessels is needed for cancer to spread in the body. (A) Small tumors emit signaling molecules that 
promote the growth of new blood vessels (angiogenesis) and provide the nutrients needed for the small tumor to grow into larger tumors. 
(B) Cancer cells shed from the growing tumor travel to other locations through the blood vessels, which spreads the cancer in the body. 


Box 9.1 Dr. Judah Folkman 


World-Famous Cancer Researcher and Free Thinker 

Dr. Judah Folkman, a world-famous cancer researcher 
whose insights led to whole new fields of medicine, died 
in 2008 at age 74. Folkman worked as a cancer doctor and 
researcher at Children’s Hospital in Boston for 36 years. His 
free-thinking, persistent style often went against the grain of 
the conservative medical community, and Folkman became a 
frequent target of criticism in 1971 after he proposed that pre- 
venting angiogenesis could inhibit tumor growth by starving 
the cancer cells for nutrients. At the time, the dogma among 
cancer specialists overwhelmingly favored surgery and toxic 
chemotherapy drugs to stop cancer from spreading, so other 
approaches including Folkman’s idea were largely discounted. 
When Folkman’s research made national headlines in 1972, 
he was accused of offering people false hope for break- 
throughs in future cancer treatments. “If your idea succeeds, 
everybody says you're persistent,” Folkman liked to joke. “If 
it doesn’t succeed, you're stubborn” (Figure 9.16). Folkman 
did not give up trying to prove his theory. He performed 


experimental cancer treatments in mice that began to con- 
vince his critics. Folkman and many other researchers began 
to search for agents that could block the formation of tumor 
blood vessels in humans. Despite this interest and the poten- 
tial of the discovery as an effective cancer treatment, the first 
natural angiogenesis inhibitor, thrombospondin, was not 
identified until 1989, followed by angiostatin in 1994 and 
endostatin in 1997. The vascular endothelial growth factor 
(VEGF) protein binds to the tyrosine kinase enzyme and regu- 
lates angiogenesis during biological processes such as wound 
repair, embryonic development, bone formation, and repro- 
ductive changes. VEGF expression is up-regulated (increased) 
by mutant oncogenes that permit the cancer cells to use VEGF 
to recruit new blood vessels. 

In 2004, based on Folkman’s pioneering research, 
Genentech developed the anti-VEGF drug, Avastin, which 
contains monoclonal antibodies that are directed against 
VEGF proteins. The Food and Drug Administration (FDA) 
approved Avastin to treat solid tumors, and it effectively 
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FIGURE 9.16 Dr Judah Folkman’s research led to the discovery of 
more than 10 new cancer treatment drugs and had profound scien- 
tific insight that led to new fields of medical research. Dr. Folkman’s 
insights into the role of blood vessel growth in cancer develop- 
ment also gave us breakthrough treatments for age-related macular 
degeneration, a leading cause of blindness in humans. In addition, 
Dr. Folkman also played an important role in the development of a 
new form of birth control that is implanted under a woman’s skin. 


Metastasis, the spread of cancer cells around the 
body, is more like a planned invasion and not a pas- 
sive process. The cancer cells in the small primary 
tumor enter the bloodstream by secreting enzymes 
that digest the extracellular components surrounding 
the blood vessel cells, which allows the cancer cells 
to squeeze between cells and enter the bloodstream 
(Figure 9.15). The metastasizing cancer cells circulate 
through the body looking for a place to reenter the tis- 
sues and start tumor growth in the new location. Some 
metastasized cells develop into a tumor made of the 
same types of cells as the cells in the primary tumor. 
For example, when bone cancer cells metastasize to 
the liver, the secondary tumor in the liver is made up 
of bone cancer cells that grow in the liver. Overall the 
process of metastasis is inefficient, but it is a successful 
way to spread the disease because millions of cancer 
cells can be released from a tumor each day. If only 
a fraction of metastasized cells survive to form a new 
tumor, the odds favor the growth of a secondary tumor 
during an individual’s lifetime. Cancer cells can also 
travel through the lymph system, which is an extensive 
network of fluid that flows throughout the body. The 
timing of the movement of cancer cells into the lymph 
nodes in patients is one measure used in the detection 
and staging of individual cancers. 
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added to the life expectancy of patients with advanced colon 
cancer, even without chemotherapy. Unfortunately, Avastin 
is not the one-size-fits-all cancer cure that many hoped for. 
Even Folkman conceded that it is much harder to cut off the 
blood supply to a tumor than he once believed. “The ideas 
are simple, but getting them figured out is very complicated.” 
In 2008, Avastin was approved to treat advanced breast 
cancers. Researchers around the world are studying at least 
10 different antiangiogenesis drugs, hoping that treatments 
with combinations of angiogenesis inhibitors will keep tumors 
from growing for extended periods of time. 

Folkman’s genius extends beyond his inspired work on 
cancer to the development of effective treatments for blind- 
ness. Angiogenesis inhibitors can help patients to success- 
fully avoid the serious vision problems and blindness caused 
by age-related macular degeneration in humans. In this eye 
disease, new blood vessels grow into and destroy the central 
part of the retina. VEGF protein expression is prevented by 
RNAi gene therapy treatments in people (see Chapter 11). The 
angiogenesis inhibitors restored vision in people blinded by 
macular degeneration; one grateful patient donated $100,000 
to Children’s Hospital. 


GENES CONTROLLING CANCER: TUMOR 
SUPPRESSOR GENES AND ONCOGENES 


Two main types of human genes are directly involved 
in the development of human cancers, oncogenes and 
tumor suppressor genes. Oncogenes are the mutant 
forms of genes found in the genome that normally con- 
trol how often a cell divides and also control genes 
that regulate the process of differentiation when cells 
acquire specialized functions. A mutation can change a 
harmless wildtype gene into a dangerous oncogene that 
drives uncontrolled cell division. Tumor suppression 
proteins function in cells by controlling processes that 
have the potential to cause cancer if left unregulated. 
More than 100 oncogenes have been identified that 
encode proteins with five major functions (Figure 9.17): 


e Growth factors. Specific growth factor proteins pro- 
mote the growth of certain types of cells. 

e Growth factor receptors. Growth factor receptor 
proteins are located on the surfaces of the cells 
where they detect and transmit signals between 
cells and into cells. 

e Signal transducers. Signal transducers are signaling 
components that transmit signals between the 
growth factor receptors on the surface of the cell 


Chapter | 9 Exploring Cell Fate 


Growth factor 


gv 


Tyrosine 
kinase receptor 


Nucleus 


Protein that 


timulat Il cycle ~- r 
SMU ASS EE Aey gjini 


FIGURE 9.17 Proto-oncogene proteins normally function in the 
pathway that transmits signals into the cell. Proto-oncogenes encode 
several different types of proteins including growth factors, growth 
factor receptors, signal transducers, transcription factors, and pro- 
grammed cell death regulators. These proteins normally transmit 
signals from the outside surface of the cell to the genome in the 
nucleus. However, DNA mutations can convert the proto-oncogenes 
into dangerous oncogenes. 


and the inside of the nucleus, where the genome can 
respond to the signal. Mutations in these components 
can prevent the cancer cell from controlling its own 
division. 

e Transcription factors. Transcription factors turn gene 
expression on and off and regulate genes that con- 
trol cell division. Mutations that overactivate the 
Myc oncogen can stimulate rapid cell division in 
lung cancer, leukemia, and lymphoma. 

e Programmed cell death regulators. Programmed cell 
death (PCD) proteins control apoptosis like a master 
switch that commands certain cells to commit suicide. 


p53: A Tumor Suppressor Superstar 


Cells normally benefit from the functions of the p53 
tumor suppressor protein. Expression of the p53 gene, 
named for the mass of the encoded protein (53 kD), 
is essential to protect the body from cancer. The p53 
protein is involved in the checkpoint feedback sys- 
tem that safeguards the genome and acts to control 
the cell cycle to avoid cancer cell development. Cells 
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FIGURE 9.18 The p53 protein binds to the DNA genome and turns 
on expression of the p21 gene. Four identical p53 proteins (shown 
in gold, blue, green, and magenta) make up the p53 tetramer that 
binds to DNA (silver) and turns on expression of the p21 gene. 


with mutant p53 proteins continue to divide even with 
damaged genome DNA because they fail to trigger 
apoptosis in the absence of p53. The genomes in these 
mutant cells accumulate additional mutations and 
become unstable. p53 was chosen as the Molecule of 
the Year by Science magazine in 1993, which reflects 
the key role that the p53 tumor suppressor protein 
plays in these biological processes. 

In normal cells, the wildtype p53 tumor suppres- 
sor protein functions in the nucleus where it acts as 
a transcription factor. p53 is one of a large number of 
transcription factors, which are different proteins that 
control the expression of genes by regulating how many 
RNA copies are made from each gene. The p53 pro- 
tein induces the expression of selected target genes by 
binding directly to chromosome DNA (Figure 9.18). 
p53 is the master control protein that functions at the 
center of the large protein network that monitors the 
health and security of the cell and reports on the integ- 
rity of the cell's DNA genome. p53 function is essen- 
tial to the proper function of the surveillance system 
that controls the cell cycle. The activity level of the 
p53 protein in the cell influences which developmen- 
tal pathway the cell will follow next, the genome DNA 
repair pathway or apoptosis. 

When damaged genome DNA is detected, the p53 
gene is expressed and the p53 proteins are made. 
A tetramer of four identical p53 proteins binds directly 
to the control DNA at the start of the p21 gene, which 
turns on expression of the p21 gene (Figure 9.19). The 
p21 protein has two major functions in the cell. First, 
p21 inhibits activity of the MPF (Cdk-cyclin) kinase 
enzyme, which is necessary for the cells to pass from 
G1 into the S phase of the cell cycle. At G1/S, the p21 
protein prevents the cells from progressing past G1 
and into S phase. As a result of this G1/S block, the 
cells in G1 continue to grow until encountering G1/S, 
at which point the cells stop growing and adopt a uni- 
form cellular morphology called a G1 arrest. Cells that 
have already passed the G1/S checkpoint and entered 
S phase also arrest because the p21 protein inhibits the 
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FIGURE 9.19 The p53 and p21 proteins act to halt the cell cycle at the G1/S cell cycle transition. (1) When the cell’s genome DNA is damaged, 
the p53 gene is turned on and the p53 protein is synthesized. (2) The p53 tetramer binds to the control region of the p21 gene, which triggers 
p21 protein production. (3) The p21 protein blocks Cdk-cyclin (MPF) activity and halts the cell cycle at the G1/S transition. (4) The p21 proteins 
also inhibit the activity of the PCNA protein and block DNA replication in S phase. A similar p53 pathway controls entry into mitosis (M). 
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FIGURE 9.20 Proliferating Cell Nuclear Antigen (PCNA) is a DNA clamp protein complex that is involved in replicating the cell DNA genome. 
(A) The structure of the PCNA DNA clamp protein is shown in ribbon form (A) and space-fill form (B). The shape of the PCNA protein complex 


is a clue to its function in the cell. The DNA clamp protein “clamps” around the DNA helix during the process of DNA replication (C). 


function of the DNA clamp protein, proliferating cell 
nuclear antigen (PCNA), which is required for DNA 
replication (Figure 9.20). 

The p53 gene plays a key role in the development 
of most types of cancers; over 50% of human cancers 
carry mutations in the p53 gene. If the mutant p53 
protein cannot bind properly to DNA, the p53 protein 
fails to induce expression of the p21 gene. Without the 
p21 protein, the MPF kinase remains active and the 
cell continues through the cell cycle instead of halting 
at G1/S. The p53 protein master switch can trigger a 
cascade of biochemical events, mediated through the 
p21 protein, which shut down the cell cycle and initi- 
ate apoptosis. 


Retinoblastoma (Rb) is a rare childhood cancer 
of the eye that affects a few hundred children in the 
United States annually and accounts for about 3% of 
all children’s cancers (under age 15). If the cancer is 
detected early enough, the Rb tumors can be treated 
successfully by radiation therapy, laser surgery, or cry- 
otreatment (freezing), and in many cases the child’s 
vision can be preserved. However, undiagnosed, Rb 
cancers metastasize and spread to the brain and cen- 
tral nervous system (Figure 9.22). 

Much like the p53 tumor suppressor gene, the Rb 
protein performs a checkpoint surveillance function that 
protects the genome by detecting damaged DNA and 
causing the cell cycle to delay to provide time for DNA 
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In the Eye of a Child: A Real-Life Retinoblastoma Story 


NBA star Derek Fisher has a contract with the Lakers for mil- 
lions, but even he couldn’t avoid a parent's worst nightmare. 
He is new to the fight against childhood cancer, which for 
Fisher began during the playoff games between the Utah Jazz 
and the Golden State Warriors (Fisher played for Utah). Fisher 
arrived extremely late to game 2; he had a family emergency. 
Fisher and his wife, Candace, have a daughter, Tatum, and 
at 10 months old Candace Fisher noticed an odd reflection 
in one of her daughter’s eyes (Figure 9.21). Soon after, Tatum 
was diagnosed with an advanced retinoblastoma (Rb) tumor in 
her left eye. Rb is the most common childhood cancer of the 
eye, with 350 cases diagnosed in the United States each year. 
Immediate treatment usually results in an excellent prognosis 
for these children, but the doctors needed to decide if they 
should remove Tatum’s left eye entirely or try a risky, cutting- 
edge procedure called intraarterial chemotherapy (IAC), devel- 
oped by Drs. David Abramson and Pierre Gobin at New York 
Presbyterian Hospital. IAC involves introducing chemotherapy 


drugs directly into the tumor, hoping to make the tumor shrink 
to prevent loss of the eye altogether. After careful considera- 
tion, Derek Fisher and his wife decided that the intraarterial 
chemotherapy procedure was the best chance they had to 
cure the advanced cancer and save Tatum’s eye. 

The IAC treatment involved injecting a very high dose of 
chemotherapy into the artery leading into Tatum’s cancerous 
left eye, allowing the blood to carry the chemotherapy agents 
directly to the cancer cells. IAC has been performed only a 
few times before and was not yet reported in medical journals. 

Derek Fisher arrived to game 2 of the playoffs under police 
escort during the third quarter and was greeted with a stand- 
ing ovation from the fans. Once in the game, Fisher forced a 
crucial turnover and a three-point shot that propelled his team 
to win the game. Time will tell if the IAC treatment also per- 
formed for Tatum, helping her to win the most important game 
of her life. Four months after the treatment, Tatum was respond- 
ing well to the chemosurgery and her prognosis is excellent. 


(A) (B) 


FIGURE 9.21 
has a retinoblastoma tumor. 


repair. The Rb protein also plays a key role in triggering 
apoptosis in cells. If the genome damage is not repaired, 
the cell can trigger its own destruction through apopto- 
sis, providing a way for the body to eliminate cells that 
pose a greater risk if allowed to grow without controls. 
Alterations in cell functions that reduce or block apop- 
tosis will significantly alter the cell development and 
fate in the body. 

In healthy children, each retinal cell carries two cop- 
ies of chromosome 13 with two wildtype Rb genes. 
Studies on the genetics of retinoblastoma tumors show 
that the genomes of afflicted children have a deletion 
mutation in human chromosome 13, which removes one 
of the two Rb genes (Figure 9.23). Retinal cells that inherit 
one wildtype and one mutant Rb gene are unaffected and 
remain healthy, but retinal cells that inherit two copies of 


In the eyes of a child: a real-life retinoblastoma story. (A) Basketball star Derek Fisher fights for his daughter's life. (B) Tatum 


the mutant Rb genes are destined to develop into a retino- 
blastoma tumor. Genetic testing performed in infancy can 
detect any chromosome 13 mutations, which is a big step 
toward the early detection and treatment of Rb cancer. 


Proto-oncogenes Cause Human Cancers 


Wildtype proto-oncogenes are normally harmless DNA 
elements carried in all human genomes, but a random 
mutation can convert a harmless proto-oncogene into 
a dangerous oncogene. These mutant oncogene pro- 
teins drive abnormal cell growth by increasing the 
transcription (expression) of selected genes. The mutant 
src and ras oncogenes are good examples of activated 
proto-oncogenes, which are continuously expressed 
(transcribed) in tumor cells, causing the cells to lose 
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FIGURE 9.22 Retinoblastoma tumor growing in the eye of a young 
child. 
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FIGURE 9.23 Retinoblastoma tumor cells are missing both Rb 
genes. (A) Normal cells in the retina of the human eye contain two 
normal Rb genes, one on each copy of human chromosome 13. 
(B) Retinal cells at risk for developing cancer have only one normal 
Rb gene because the second Rb gene was removed from the chromo- 
some due to a deletion in the q14 region of human chromosome 13. 
(C) Retinal cells that are missing both Rb genes grow out of control 
and develop into retinoblastoma tumor cells. 
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cell cycle control and divide much more rapidly than 
normal. 

Src was the first oncogene to be identified. The src 
gene codes for a tyrosine kinase enzyme that cata- 
lyzes the transfer of phosphate groups onto tyrosine 
amino acids in certain target proteins. The addition and 
removal of phosphate groups on proteins can act like 
a master switch that controls the activities of the tar- 
get proteins. Healthy cells make only low levels of the 
src protein, which functions to transmit signals to the 
nucleus. However, mutations altering src gene expres- 
sion overproduce the protein and cause neuroblastoma, 
lung, colon, and breast cancer. 

Receptor proteins sitting on the surfaces of healthy 
breast cells receive and transmit signals from neigh- 
boring cells to the cell nucleus. The human epidermal 
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growth factor receptor-2 (Her-2) protein is also located 
on the surfaces of the breast cells. Her-2 binds to spe- 
cific growth factor proteins and functions in cell-to-cell 
signaling. Some types of breast cancer cells overpro- 
duce the Her-2 protein, flooding the surfaces of the cells 
with receptor proteins, and triggering uncontrolled cell 
division. Certain breast cancer treatments are designed 
to block overexpression of the Her-2 protein in cancer 
cells, including Herceptin (trastuzumab), which was 
made by the biotechnology company Genentech and 
was approved by the FDA in 1998. Herceptin is another 
example of a drug that is actually a monoclonal anti- 
body designed to bind to a specific protein, in this case, 
Her-2. The binding of the antibody to the Her-2 pro- 
tein blocks the activity of the Her-2 protein and inhib- 
its transmission of the signals telling the cell to divide. 
Herceptin treatment kills the rapidly dividing breast 
cancer cells that are making large amounts of Her-2 
protein. Not all types of breast cancers overexpress Her- 
2, but the “Her-2 positive” breast cancers are usually 
more aggressive and have a high risk of cancer recur- 
rence compared to the breast cancers that test negative 
for Her-2 overexpression. 

The ras proto-oncogene and protein are also involved 
in the kinase-mediated signaling pathways that control 
the expression of certain genes in the cell, which, in 
turn, control cell division and cell development. The 
ras protein acts like a switch controlling a cellular path- 
way; when ras binds to guanosine triphosphate (GTP) 
in the cell, the pathway is turned on. The ras protein 
must release the GTP molecule to turn the pathway off. 
However, a mutation in the ras proto-oncogene causes 
the production of mutant ras oncoproteins that cannot 
release the GTP, and as a result the pathway becomes 
stuck in the “on” position, leading to uncontrolled 
cell growth and proliferation (Figure 9.24). The conse- 
quences of the mutant ras oncogenes are devastating 
to the cell because the ras protein is normally involved 
in so many critically important signaling pathways. 
In addition, ras mutations have been identified in many 
different cancers, including pancreas (90%), colon 
(50%), lung (30%), thyroid (50%), bladder (6%), ovar- 
ian (15%), breast, skin, liver, kidney, and some forms 
of the blood cancer, such as leukemia. 

The myc proto-oncogenes encode a transcription 
factor protein that normally regulates the expression 
of several different genes (Figure 9.25). The myc proto- 
oncogenes are activated into myc oncogenes by gene 
rearrangements or by DNA amplification events involv- 
ing the breakage, rearrangement, and duplication of 
regions of chromosome DNA. These types of chromo- 
some changes have the potential to alter the normal 
expression of many genes. Myc oncogenes are linked 
to several types of cancer including Burkitt's lym- 
phoma, B-cell leukemia, and lung cancers. A specific 
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FIGURE 9.24 Ras oncoprotein. The ras proto-oncogene and ras 
protein are also involved in the kinase-mediated signaling pathways 
that control the expression of certain genes. 


FIGURE 9.25 Myc protein binds to the DNA helix. The myc proto- 
oncogene encodes a transcription factor protein that normally regu- 
lates the expression of several different genes. The alpha helices in 
the Myc protein are shown in light purple; the DNA helix is shown 
multicolored. 


DNA translocation from one chromosome to another 
chromosome causes overexpression of the myc gene 
(and protein), and eventually leads to a B-cell cancer. 
The proteins involved in DNA replication can suc- 
cessfully copy the chromosome DNA until they reach 
the telomeres at the ends of the chromosomes, where 
technical difficulties prevent the DNA polymerase 
enzyme from copying the DNA strands located at the 
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FIGURE 9.26 Telomeres are the ends of linear human chromo- 
somes. (A) Telomere proteins (yellow) are located on the ends of 
the blue chromosomes. (B) The RNA component of the telomerase 
ribozyme, an enzyme made up of protein and RNA components. 


extreme ends of the linear chromosomes (Figure 9.26). 
This complication would cause the progressive short- 
ening of the chromosome ends with each cell division, 
with successive divisions potentially deleting essential 
genes and killing the cell. The problem of how to repli- 
cate the ends of the chromosomes is solved by a novel 
ribozyme called telomerase (hTERT), which is a com- 
plex composed of a protein bound to an RNA molecule. 
The hTERT gene encodes the human protein component 
of the telomerase ribozyme, and the gene for the short 
RNA component is encoded elsewhere in the genome. 
Telomerase replicates the DNA at the ends of linear 
chromosomes using a completely different molecular 
mechanism than that used by DNA polymerase. The 
telomerase enzyme is made in large amounts in rapidly 
dividing cells such as in a developing fetus, but most 
adult cells divide less often and as a result do not need 
much telomerase. Cells with limited amounts of telom- 
erase can enter senescence (GO in the cell cycle), which 
represents long-term growth arrest in G1. 


Since cancer cells divide rapidly they also produce high 
levels of telomerase enzyme. Like myc and src, hTERT is 
a proto-oncogene that functions as an oncogene when 
mutated and promotes uncontrolled cell growth. 


In normal cells, tumor suppressor proteins func- 
tion to limit the frequency of cell division, and con- 
trol processes such as gene expression, DNA repair, 
and cell-cell communication. Mutations that alter the 
function of the tumor suppressor proteins can lead to 
abnormal cell division and cancer development. An 
example is most colon cancers that develop slowly 
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over a period of years, allowing time for multiple 
genetic changes to accumulate and convert the nor- 
mal colon cells into tumor cells. About 5% of all colon 
cancers are the result of inherited genetic abnormali- 
ties that are characterized by the early appearance of 
polyps in the colon, growths that have the potential to 
become a colon cancer. Of the several kinds of inher- 
ited colon cancers, the most common are familial 
adenomatous polyposis (APC or FAP) and hereditary 
nonpolyposis colon cancer (HNPCC) (Figure 9.27). 
Nonhereditary colon cancers rarely develop before 
age 40 and are caused by sporadic genome mutations. 

The APC protein, like many tumor suppressors, 
controls the expression of genes that are necessary to 
the cell-division process. A mutation in the APC gene 
on chromosome 5 prevents expression of the APC pro- 
tein and causes increased cell division and the devel- 
opment of colon cancer. Individuals who inherit a 
mutation in only one of the two APC genes inherit a 
precancerous condition that causes many colon pol- 
yps to grow, with each polyp having the potential to 
develop into colon cancer. People who inherit preexist- 
ing mutations are at a much higher risk for developing 
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FIGURE 9.27 The genetics of hereditary colon cancer (HNPCC). 
If one parent carries a copy of the HNPCC gene mutation, there is a 
50% chance that each child will inherit the HNPCC mutation and 
a 50% chance that each child will inherit two normal genes. Only 
one mutant form of the gene is necessary to cause disease because 
HNPCC is an autosomal dominant mutation. 


DNA and Biotechnology 


colon cancer because these individuals require fewer 
genetic changes to convert a cell into colon cancer. 


Early Detection and Treatment of 
Cancer are Critically Important 


The importance of early detection of colon cancer was 
brought to the public’s attention when Katie Couric, then 
a host of NBC’s Today show, had her colon screened by 
having a colonoscopy test on live television (Figure 9.28). 
Couric’s husband had succumbed to colon cancer, a 
devastating loss that inspired her to start a campaign to 
help inform the public about the importance of early 
detection for successful cancer treatment. Couric sur- 
prised everyone by having her preventive colonoscopy 
performed for the first time on live morning television. 
A follow-up study revealed that Couric’s efforts to spread 
the word about the early detection of colon cancer had 
saved many lives. Because genetic testing is available, 
people with a mutant gene can be monitored closely for 
the first signs of colon cancer, to take advantage of the 
benefits of early detection and treatment. The increased 
risk of developing colon cancer usually prompts these 
people to be monitored often for signs of cancer. More 
widespread genetic testing should increase the number 
of people who decide to seek early detection and treat- 
ment for many different types of cancers. 


Human Breast Cancer Genes 
BRCA1 and BRCA2 


According to the National Cancer Institute, more than 
192,000 women in the United States are diagnosed 
with breast cancer each year (Figure 9.29), but only 
5% to 10% of these women have an inherited form of 
breast cancer disease. These women have mutations 
in the BRCA1 or BRCA2 genes that make them more 
susceptible to developing breast cancer. The names of 
the BRCA1 and BRCA2 genes stand for breast cancer 1 
and breast cancer 2, respectively (Figure 9.30). The 
proteins encoded by the wildtype BRCA1 and BRCA2 
genes are involved in regulating gene expression and 
repairing genome damage. The BRCA1 and BRCA2 
genes contain highly repeated DNA sequences that are 
prone to acquiring mutations. The BRCA proteins inter- 
act with transcription factor proteins that control the 
transcription of several genes including the p53 and 
p21 genes. When the BRCA proteins do not function 
properly, the failure to correct the DNA damage in the 
genome generates cells with potential chromosome 
rearrangements and abnormal chromosome number 
that promotes the development of malignant cancer 
cells (Figure 9.31). Scientists estimate that about 13% 
of American women have a lifetime risk of develop- 
ing breast cancer, compared to 36% to 85% of women 
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FIGURE 9.28 Early detection of colon cancer. (A) The colon is part of the large intestine. Sigmoidoscopy and colonoscopy are two methods 
doctors use to look for tumors. (B) Then NBC anchor woman Katie Couric has her colon screened for cancer on live TV. Katie Couric hosted 
a series on the NBC Today show called “Confronting Colon Cancer” to increase public awareness of colorectal cancer. (B) She stressed the 
importance of early detection of cancer, including the critical step of undergoing a colon exam (colonoscopy) after age 50. 


p- Lobules 


FIGURE 9.29 Breast cancers affect several different types of tissues. 


who inherit the mutant BRCA1 or BRCA2 alleles. These 
women have an increased risk of developing breast or 
ovarian cancers at a young age (before menopause), 
often have family members with the disease, and may 
also face an increased risk for colon cancer. BRCA2 
mutations are also associated with an increased risk of 
lymphoma, melanoma, and cancers of the pancreas, 
gallbladder, bile duct, and stomach. 

The lifetime risk of ovarian cancer in the general 
population indicates that 1.7% of women will get ovar- 
ian cancer, compared to 16% to 60% of women with 
mutant BRCA1 or BRCA2 genes. Genetic research on 
BRCA1 and BRCA2 involved studies on large families 
with members affected by cancer and provided estimates 


of the cancer risks associated with inheriting the 
BRCA1 or BRCA2 mutant genes. However, because 
family members typically share common environments 
as well as genes, it is possible that the increased 
number of cases of cancer in these families is due at 
least in part to genetic or environmental factors unre- 
lated to mutations in the BRCA1 or BRCA2 genes, and 
the increased risk in these families might not accurately 
reflect the levels of risk in the general population. 


Acquiring multiple spontaneous mutations in the same 
DNA genome is a rare event, which explains why cancers 
caused by multiple mutations take time to develop and tend 
to occur later in life. The total cancer risk for any individual 
depends on personal exposure to genetic and environmental 
risk factors and the individual's unique genetic makeup. 


Environmental Factors Contribute to Cancer 
Development 


Many environmental factors contribute to the develop- 
ment of cancer cells, including some viruses carrying 
oncogenes that cause human cells to become cancer- 
ous. Epstein-Barr virus, for example, not only causes 
infectious mononucleosis (“kissing disease”), but it is 
also implicated in the development of Burkitt's lym- 
phoma and nasopharyngeal cancers. Exposure to ultra- 
violet light (UV), x-rays, and ionizing radiation also 
cause damage to the genome DNA and alter the genetic 
makeup of an organism (Figure 9.32). Many environ- 
mental agents are known to damage DNA and induce 
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FIGURE 9.31 Malignant breast cancer cell. This breast cancer cell 
is capable of metastasis, spreading to other locations in the body. 


the development of cancer (carcinogenesis), including 
DNA alkylating agents (leukemia), asbestos (mesothe- 
lioma of the lung), aromatic hydrocarbons and benzo- 
pyrene from air pollution (lung cancer), tobacco smoke 
(lung cancer, oral cavity and upper airway cancer, pan- 
creatic cancer, esophagus cancer, bladder and kidney 
cancer), and vinyl chloride (angiosarcoma of the liver) 
(Figure 9.33). The human diet also plays an important 
role in the development of cancers of the gastrointestinal 
tract; the consumption of high levels of animal fat, food 
additives containing nitrates, some types of sugar substi- 
tutes, and chemicals associated with charbroiled meat is 
linked to gastrointestinal cancers. The role of hormones 
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(B) 
FIGURE 9.30 BRCA genes and proteins. (A) The BRCA1 gene is carried on chromosome 17 and BRCA2 on chromosome 13. (B) The breast 
cancer protein (BRCA1). 


in cancer development is not yet clear, but excess 
amounts of the hormone estrogen can cause cancer in 
test animals, and a synthetic estrogen (diethylstilbestrol) 
has been linked to vaginal cancer in some daughters of 
women treated with the synthetic hormone. 


CLINICAL TRIALS TO TEST HUMAN 
CANCER TREATMENTS 


Different kinds of clinical trials are used to test various 
methods of cancer prevention, screening, and treatment 
and to find ways to improve the quality of life for 
cancer patients and survivors (Figure 9.34). Several 
types of different clinical trials are available, and the 
public is allowed to participate: 


e Prevention trials test new ways to reduce the risk of 
developing certain types of cancer, including the use 
of medications, vitamins, and other supplements. 
These trials can include cancer survivors who want 
to prevent a recurrence or reduce the risk of devel- 
oping a different type of cancer. 

e Screening trials are early detection studies and treat- 
ments designed to work before the cancer metasti- 
sizes and is much less treatable. 

e Diagnostic trials study the effectiveness of different 
detection protocols to identify early cancers more 
easily. 

e Quality-of-life (also called supportive care) trials 
are designed to study approaches to improve the 
quality of life for current cancer patients and cancer 
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FIGURE 9.32 lonizing radiation 
damages the DNA genome. (A) 
Arrows indicate broken chromo- 
somes. (B) Radiation damages the 
DNA in the chromosome. 


Tumor 


FIGURE 9.33 Environmental sources of cancer-causing agents. The environmental sources of cancer-causing agents (carcinogens) that harm 
DNA include water and air pollution, nuclear power plants, and cigarette smoke. 


survivors. The issues of concern include nausea, 
vomiting, depression, and sleep disorders as well 
as many other negative side effects of cancer and 
cancer treatments. 

Genetic clinical studies focus on the genes involved 
in specific cancers to determine how the genetics of 
the cancer cells affect the way that the cells respond 
to cancer treatments. 

The clinical trials that study the effectiveness of new 
cancer treatments, including drugs, vaccines, sur- 
gery, and chemotherapy, are organized into three 
phases. Phase | trials are designed to test the safety 
of drugs in human patients. Phase II trials extend the 
safety information and test different doses of drug 


3 FIGURE 9.34 Clinical trials to 
5 

test new treatments for cancer 
and other serious diseases. 


treatments. Phase III trials involve patients with can- 
cer who are treated with a new drug or therapy with 
the goal of comparing the effectiveness of a new 
approach to the currently accepted treatment for a 
given cancer. 


FATE 2: DEVELOPMENT OF SPECIALIZED 
CELLS 


There are about 220 different types of cells in the adult 
human body that perform many thousands of diverse 
cellular functions. Highly specialized cells perform 
many important jobs in the human body: islet cells in the 
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pancreas produce insulin, nerve cells send impulses in 
the brain, and stomach cells secrete digestive enzymes, 
to name just a few (see Chapter 12). These biologi- 
cal activities require a cast of thousands; the average 
human body contains about 60 trillion to 100 trillion 
cells. Highly specialized cells usually develop from the 
precursor cells through the processes of differential gene 
expression and cellular differentiation. Under normal cir- 
cumstances, most cells in the human body are fully dif- 
ferentiated into a specialized shape and are permanently 
committed to performing a certain function. Under nor- 
mal circumstances highly differentiated cells are unable 
to revert back to an undifferentiated, nonspecialized 
state. Many types of cells have the capacity to undergo 
developmental changes, for example, in response to a 
threat. When fighting an infection, the body’s immune 
system produces highly specialized cells that recognize 
and kill the invading pathogen. In response to the infec- 
tion, immune precursor cells in the bone marrow differ- 
entiate and form the mature immune cells that produce 
antibodies against the invading microbe. 


Specialized Cells are Generated 
by Stem Cells 


To better understand the complex processes involved in 
developing various specialized types of human cells, it 
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is useful to start at the beginning and review how cells 
develop and change during embryogenesis. It starts 
when a sperm fertilizes an egg cell, and the resulting 
zygote contains two genomes. The zygote divides by 
mitosis and makes a ball containing a small number of 
genetically identical embryonic cells. Four to five days 
after fertilization, the growing human embryo has taken 
the form of a blastocyst, a hollow ball that consists of 
about 200 cells and contains a handful of (the amazing) 
embryonic stem cells (ESC) growing inside (Figure 9.35). 

Embryonic stem cells are special because they give 
rise to all of the different types of cells that will eventually 
make a human body. The ESCs have the genetic poten- 
tial to perform all of the different biochemical reactions 
required in the body and can develop into every cell type 
needed. Embryonic stem cells have the most develop- 
mental potential compared to other types of stem cells. 
For example, adult blood stem cells, which are found 
in the bone marrow of adults, are capable of develop- 
ing into all the different types of cells found in the cir- 
culating blood (Figure 9.36). But the adult stem cells in 
the bone marrow cannot develop into nerve or muscle 
cells. The unlimited developmental potential of embry- 
onic stem cells have made them the focus of biomedical 
research across the globe (see Chapter 12). During nor- 
mal human embryogenesis, the ESCs divide, differentiate, 
and migrate to form an embryo with three distinct layers 
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FIGURE 9.35 ESCs develop into different types of cells. During embryonic development, the embryonic stem cells (ESCs) develop into a 
large variety of diverse cells that make up the brain, muscles, stomach, bone, and all other tissues in the human body. Inset shows a blasto- 


cyst (left) and cross-section of a blastocyst (right) showing ESCs inside. 
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FIGURE 9.36 The adult human body has multipotent stem cells. The 
adult human body contains many examples of precursor cells that 
each develop into certain subsets of cells. However, these cells are 
different from the ESCs because the multipotent cells cannot develop 
into all of the cells in the body. 
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of embryonic tissue, the ectoderm, endoderm, and meso- 
derm. Each of these three tissue layers contains stem cells 
with the potential to develop into some but not all of the 
cell types needed in the human body. The ectoderm tis- 
sue gives rise to the central and peripheral nervous sys- 
tems, the mammary glands, the pituitary gland, and tooth 
enamel, whereas the endoderm forms the gastrointestinal 
tract and the mesoderm generates bone, cartilage, con- 
nective tissue, and muscle cells. 


FATE 3: APOPTOSIS IS PROGRAMMED 
CELL DEATH 


During embryo development and throughout adult 
life, body cells are selected to die at appropriate times 
and are sometimes replaced by other more special- 
ized cells. Precursor cells give rise to different types 
of specialized cells that perform specific functions 
and have life spans measured in days or weeks and 
not years. Apoptosis or cell suicide occurs routinely in 
the human body, sometimes to remove unneeded cells 
and other times as a result of injury to tissues and cells 
(Figure 9.37). As suggested by the alternative name 
“programmed cell death” (PCD), apoptosis proceeds 


Apoptotic cell 
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FIGURE 9.37 Apoptosis (cell suicide) occurs when genome damage cannot be repaired and the cell triggers its own death. (A) Once apop- 
tosis is triggered (top), the cell begins to shrink. The chromosomes condense, the DNA is cut into small fragments (bottom) and the nucleus 
collapses. The plasma membrane begins to form bubbles and blebs. Apoptotic bodies form and lyse and cell debris is engulfed by macro- 
phage cells that are attracted to the dying cell by the blebbing membranes. (B) A white blood cell (right) dying by apoptosis (programmed cell 


death). (C) A cancer cell (right) undergoing apoptosis. 


214 


through a series of genetically controlled biochemical 
events that cause characteristic changes and trigger cell 
death. When a cell follows this pathway toward self- 
destruction, and the apoptosis genes are expressed, 
the cell undergoes a series of morphological changes, 
including blebbing of the cell membrane, cell shrink- 
age, and chromosome fragmentation. As the cells die, 
they form apoptotic fragments of cell debris that are 
subsequently removed by the macrophage cells. 


Apoptosis Makes Embryos with 10 Fingers 
and 10 Toes 


Cell death is carefully controlled during the intricate 
cellular processes that occur during embryogenesis of 
plants and animals. In fact, a defect in apoptosis can 
cause a common birth defect that affects the fingers of 
newborn babies. Human embryos develop with tissue 
growing between their fingers and toes that normally 
disappears before birth. However, sometimes the web- 
bing persists after an infant is born. This birth defect, 
called syndactyly, is caused by a mutation in a specific 
gene that prevents apoptosis in the webbing between 
the fingers of the embryo (Figure 9.38). Depending on 
the severity of this condition, the fused fingers are often 
surgically corrected in a newborn. The gene that causes 
this disorder is inherited as an autosomal dominant 
genetic mutation (see Chapter 10). 


FIGURE 9.38 Apoptosis gives babies their fingers and toes! 
(A) Day 51 human hand. (B) Growing embryo shows webbing. (C) 
Day 48 human hand. (D) Hand with syndactyly. 
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Apoptosis is an important focus for the discovery of 
novel cancer treatments, such as new anticancer drugs 
that induce apoptotic death in the cancer cells without 
affecting the surrounding healthy cells. 


SUMMARY 


The purpose of this chapter is to better understand 
the genetic and biochemical mechanisms used by 
eukaryotic cells to determine cell fate. The term 
“cell fate” refers to processes such as cell division 
and reproduction, which act to maintain or change 
chromosome number (ploidy) during the cell cycle. 
This chapter discusses the origin of more than 200 
different cell types required by the adult human body. 
The embryonic stem cells differentiate and generate 
the new shapes, structures, and functions of the highly 
specialized cells that perform functions such as trans- 
mitting nerve impulses or secreting digestive enzymes. 
Human cells are not designed to last a lifetime. When 
highly specialized cells wear out and can no longer 
function adequately, the cells are replaced using a 
natural recycling system. The human body also needs 
to replace cells damaged by disease or trauma, which 
often involves apoptosis (programmed cell death). 

The human genes involved in making cell fate 
decisions control the cell cycle, regulate gene expres- 
sion, and convert normal cells into cancer cells. Proto- 
oncogenes, oncogenes, and tumor suppressor genes 
all play key roles in the prevention and development 
of cancer cells. 


REVIEW 


In this chapter we discussed how eukaryotic cells exe- 
cute gene expression patterns that determine cell fate: to 
divide by mitosis or meiosis, to develop into a special- 
ized cell type, or to commit to apoptosis. To test your 
comprehension of the chapter’s contents, answer the 
following questions: 


1. What types of information (signals) does a cell use 
to decide its fate? 

2. What types of cell fates can a cell choose? 

3. Cyclin proteins get their name from what charac- 
teristic feature? 

4. Indicate the order that the four phases of the cell 
cycle occur in normal cells. 

5. What happens to the checkpoint feedback mecha- 

nism in cancer cells? 

Explain the relationship between proto-oncogenes 

and oncogenes. 


> 
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7. Describe the most important features of one of the 
tumor suppressor genes (and proteins) described in 
the chapter. 

8. Explain the important role that apoptosis plays in 
embryogenesis and fetal development. 

9. Explain the structure and function of MPF in the 
cell cycle. 

10. Explain the role of metastasis in cancer disease. 
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Autism Symptoms Reversed in Lab 


BBC News, June 27, 2007 

Reversing autism, even in lab mice, is really amazing 
news. For the first time ever the symptoms of mental retar- 
dation have been reversed in laboratory mice. Autism has 
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risen to strike 1 in every 150 people born in the United 
States in 2007. Mutations in the fragile-X mental retardation 
gene (FMR1) gene are currently the leading genetic cause of 
mental retardation and autism in humans. To create a model 
animal with autism symptoms to study in the lab, the scien- 
tists made “knockout (KO) mice”, which no longer carry the 
FMR1 gene in their genomes. The FMR1 KO mice showed 
evidence of cognitive disorders and exhibited hyperactive, 
purposeless, and repetitive actions, all behaviors that are 
quite different from normal mice. 

From human studies, scientists knew that an enzyme 
called PAK3 (p21-activated kinase 3) was implicated in 
mental disorders and might be a good target for develop- 
ing new drugs to treat autism. Scientists tried to reverse 
the symptoms of autism in the FMR1 KO mice by block- 
ing the action of the PAK enzyme in the mouse brain. Not 
only was the treatment effective, it even worked in mice 
that exhibited pronounced symptoms associated with 
autism. Microscopic analysis of tissue from the brains of 
FMR1 KO mice treated to block PAK activity showed that 
the treatment resulted in the repair of damaged nerve cells 
and rebuilt the connections between neighboring nerves, 
restoring proper electrical communication between the 
cells in the mouse brain. 

“This is very exciting because it suggests that PAK 
inhibitors could be used for therapeutic purposes to 
reverse already established mental impairments in fragile X 
children.”—Professor Eric Klann, New York University 
Center for Neural Science. 


When scientists reported that they had success- 
fully “cured” fragile X syndrome and autism in mice 
for the first time, it might seem like modest progress. 
But genetic disorders affecting nerve and brain func- 
tion in humans and other mammals are caused 
by complicated interactions among many genes. 
Defective nerve cell development can result in a vari- 
ety of mental deficits and physical limitations that 
are difficult to study in humans. The results of test- 
ing a PAK3 inhibitor drug on the FMR1 KO mice 
were very encouraging and have prompted scientists 


217 


Chapter 10 } 


218 


to search for drugs that are safe to use in humans and 
effectively block PAK3 activity in people with autism 
or fragile X syndrome. 


LOOKING AHEAD 


This chapter describes some of the many thousands 
of normal (wildtype) genes and mutant genes that are 
implicated in causing some common genetic diseases 
and disorders in humans. On completing the chapter, 
you should be able to do the following: 


e Describe the differences between simple and 
complex genetic diseases in humans. 

e Explain how a mutation in a gene affects the prod- 
uct of the mutant gene. 

e Appreciate the role of the environment in damag- 
ing DNA and in the development of human genetic 
disease. 

e Make the connection between biomedical research 
involving animals and the essential treatments for 
human diseases that directly result from research 
on animals. 

e Explain how the biological families of patients with 
genetic diseases contribute to our understanding of 
human gene function. 

e Understand the structure of a DNA probe, and 
explain how it is used to identify the location of a 
target gene in a chromosome. 

e Describe the role that the DNA differences between 
individual human genomes have played in search- 
ing for human genes involved in genetic diseases. 

e Have a better understanding of the functions of the 
macromolecular machines and how the machines 
are assembled from many protein parts in the cell. 

e Access the most accurate and up-to-date informa- 
tion about human genetic diseases available using 
a computer and the Internet. 


INTRODUCTION 


The past 50 years have seen amazing progress in our 
understanding of genes and how genes function in the 
human body. Scientists from all over the world collabo- 
rated to determine the entire DNA sequence (the exact 
order of the bases in the DNA) of all 3.2 billion bases 
that comprise the human genome (see Chapter 6). 
Ongoing genome studies have focused on learning 
about the functions of the proteins encoded by approx- 
imately 20,000 human genes. However, even though 
we know the entire human genome DNA sequence, 
the identities and functions of all 20,000 individual 
genes are not yet known. Unfortunately the function 
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of a gene product, usually a protein, is not usually 
obvious just from DNA sequence analysis alone. 
Additional studies are usually required to understand 
how different proteins function in the cells. The tech- 
nical ability to explore the structure and function of 
genes at the level of the DNA molecule has dramati- 
cally changed and revolutionized our concept of how 
genes work and substantially increased our under- 
standing of how a mutation alters the structure and 
function of a protein and sometimes changes the fate 
of the entire organism. 


GENETIC DISEASES ARE CAUSED BY 
MUTANT GENES 


Humans all start life by inheriting one version of each 
gene from their mother and one version of each gene 
from their father. These two different versions of the 
same gene are Called alleles. The maternal and pater- 
nal alleles of a gene can be slightly different from each 
other in DNA sequence; one allele might be the nor- 
mal (wildtype) version of the gene, whereas the other 
allele might be a mutant version of the gene. The 
consequence to the person who inherits these alleles 
depends on the specific gene involved, the nature of 
the mutation, and to some extent on the environment 
in which the person lives. The interaction of inher- 
ited normal and mutant genes with the many factors 
in an individual’s environment is so complex that, at 
present, we do not have all the information needed 
to entirely understand the impact of environment on 
genetics. However, since we can usually determine the 
genes involved and the effects of mutations in genes, 
this genetic information is often the starting point for 
unraveling the complex networks of genes that dramat- 
ically influence our health, personality, and, in fact, 
every aspect of our lives. 

Many different kinds of gene mutations are discussed 
in this chapter, and they all represent changes in the 
linear sequence or order of the DNA bases. In the case 
of mutations, a very small change in the DNA sequence 
can either have drastic consequences, or have no effect 
at all. The impact of small DNA changes is evident from 
studies on hemoglobin, an essential protein in the red 
blood cells that carry oxygen in the bloodstream. Adult 
hemoglobin (HbA) contains the 8-globin protein, which 
binds to oxygen. A specific single base pair change 
(called a point mutation) in the 3-globin gene sequence 
alters one amino acid in the globin protein (HbS) and 
causes the disease sickle cell anemia (Figure 10.1A). 

The mutant HbS proteins are made inside the red 
blood cells of a person with sickle cell anemia clump 
together and form long, insoluble rod-shaped polymers, 
which distort the red blood cells into characteristic 
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FIGURE 10.1 


Mutations in 8-globin cause blood cell diseases. (A) A point mutation in 8-globin makes sickle cell protein. The wildtype 


8-globin gene makes the normal protein, HbA, which is found in healthy red blood cells (left). This point mutation in the B-globin gene 
causes a valine (val) amino acid to be substituted for a glutamine (glu) amino acid to make the hemoglobin S protein (HbS); the mutant HbS 
protein makes the red blood cells become sickle-shaped and causes sickle cell anemia (right). (B) Point mutation in B-globin makes the short 
protein that causes $-thalassemia. The wildtype 8-globin gene makes the normal protein, HbA, which is found in healthy red blood cells. 
(left) The C to T mutation in the DNA encodes a UAG stop codon in the mRNA, which in turn causes premature termination of the growing 
8-globin amino acid chain. The result is a short, nonfunctional mutant 6-globin protein. 


sickle or banana shapes. The sickle-shaped red blood 
cells tend to clog the smallest blood vessels, the capil- 
laries, and interfere with the blood flow in the body, 
causing severe pain and other disabling symptoms. 

A different point mutation in the B-globin gene cre- 
ates a new protein synthesis stop signal, UAG, in the 
B-globin MRNA. This change replaces a glutamate (glu) 
amino acid in the B-globin protein with a stop codon 
that forces the production of a prematurely shortened 
8-globin protein chain containing only 39 amino acids, 
instead of the usual 146 amino acids (Figure 10.1B). 
The short 6-globin protein is functionally useless and 
causes the blood disorder, 3-thalassemia. 

Humans inherit one copy of each gene from each 
biological parent, so it is possible to inherit two normal 
(wildtype) versions of the gene, two mutant versions of 
the gene, or one wildtype copy and one mutant copy 
of the gene. When a cell inherits two different versions 
(alleles) of the same gene, does the trait controlled 
by one allele “win out” over the other? The answer is 
sometimes yes, sometimes no. For our purposes, there 
are two possibilities. First, it is possible that both gene 
products are made in the cell and both proteins have an 
effect, a situation called codominance. A second possi- 
bility is that one version of the gene makes a functional 
protein product and is the dominant allele, whereas the 
other version of the gene makes a nonfunctional prod- 
uct and is the recessive allele. The dominant allele is 
the form of the gene that has an observable effect on the 
organism, the trait. This observable change is called the 
phenotype (blue eyes, black hair, an inherited disease, 
or other features). 


A “point mutation” changes only one DNA base in an entire 
gene sequence, yet the consequences to the organism can 
be very serious; witness how a point mutation can damage 
the structure and function of hemoglobin. 


Genes in human chromosomes send instructions 
to the cell by the process of gene expression (Figure 
10.2A). Two copies of each gene are present in the 
genome of each cell, but at any one point in time, 
some genes are expressed (turned on) while others are 
silent (turned off). When a gene is turned on, the DNA 
sequence encoding the gene is copied into a long pre- 
cursor RNA, which is then processed by RNA splicing 
to make the final messenger RNA (mRNA). The mRNA 
is exported out of the nucleus (through a pore in the 
membrane) to the cytoplasm where it delivers a copy of 
the specific genetic information to the ribosome, which 
produces proteins in the cell according to the specific 
instructions encoded in the mRNA. A gene without a 
mutation usually produces a protein that functions cor- 
rectly. However, a mutant gene can encode a protein 
that is defective and is unable to function correctly in 
the cell (Figure 10.2B). 


10,000 HUMAN GENES POTENTIALLY 
CAUSE GENETIC DISEASES 


Scientists have known for years that inherited diseases 
are caused by mutations in genes, which in turn produce 
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FIGURE 10.2 Wildtype and mutant genes are expressed using similar pathways. (A) Overview of the major gene expression pathway used by 
eukaryotic cells. A gene is copied into many long RNA transcripts in the nucleus. Most human genes require splicing; the interrupted coding 
regions of the gene are copied into long precursor RNAs that contain both introns and exons. These long RNA transcripts are processed by 
RNA splicing, which removes the introns from the precursor RNAs to make “mature” mRNAs containing exons in the nucleus. RNA splicing 
is required before the mRNAs can be transported to ribosomes in the cytoplasm, where each mRNA is translated into a chain of amino acids 
specific for that protein. (B) Closeup of the expression of wildtype and mutant genes and proteins. The gene DNA sequence is copied into a 
precursor RNA (transcription) that is spliced to make the mRNA in the nucleus. (The X indicates a mutation in the DNA that changes the DNA 
sequence of the gene.) The mRNA is transported out of the nucleus and into the cytoplasm where the mRNA is translated by the ribosome 
into a specific protein. The mutation in the DNA sequence is transferred into the RNA sequence by transcription and is then incorporated into 
the mutant protein product in the form of incorrect amino acids. 


defective proteins that fail to function properly in the 
cell. When the human genome DNA sequence was 
completed, it became clear that in the past we had 
overestimated the total number of human genes and 
probably underestimated the number of genes involved 
in human diseases. In 1966, the noted founder of the 
science of medical genetics, Victor McKusick at Johns 
Hopkins University, had cataloged the 1500 genetic 
diseases known in humans at that time in his clas- 
sic book, Mendelian Inheritance in Man. Now, more 
than 40 years later, McKusick is lead editor of the 
Mendelian Inheritance in Man (OMIM) web site, with 
a database of more than 10,000 human genes associ- 
ated with human diseases (Figure 10.3). 


The flow of genetic information in the cell starts with the 
gene’s DNA, is transferred to the messenger mRNA, and 
then to the protein product: DNA to mRNA to protein. 


All human genetic diseases pose complex scientific 
and medical questions, whether the disease involves a 
single base pair change in a single gene or is caused 
by changes in many mutant genes. The impact of even 
a single defective gene product (protein) on cellular 


FIGURE 10.3 Victor McKusick is the “Father of Genetic Medicine.” 


functions can be very complex. Many genes are part of 
a network of genes encoding proteins that control the 
expression of other genes (Figure 10.4). Mutant genes 


McKusick is the lead editor of the Mendelian Inheritance in Man 
(OMIM) web site. OMIM has information on more than 10,000 
human genes that are associated with human diseases. 
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often, but not always, produce defective proteins, 
which fail to perform a job in the cell. To determine 
how a mutant gene might cause a particular disease, 
scientists study how the gene and the protein function 
in normal cells. 

With so many human genes implicated in genetic 
diseases, it is not possible to discuss them all in this 
chapter, or even in this book. Instead we have decided 
to focus on certain human genes and diseases that are 
particularly instructive, showing key concepts in gene 
expression and protein function, as well as those with 
especially interesting science stories to tell. A section 
at the end of this chapter provides advice about how 
to find accurate and reliable online information about 
the thousands of genes and diseases that we will not 
be able to cover here. 


At least one-half of the 20,000 genes in the human genome 
have been implicated as causing genetic diseases in people. 


Environment Can Cause Mutations in 
Human Genome DNA 


The expression of our genes controls who and what we 
all are, to a great extent, but the impact of our genes 
is also strongly influenced by various environmental 
factors. Heart disease is a good example of how an 
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FIGURE 10.4 Expression of some genes 
can affect the expression of other genes. 
The expression of genes in human cells is 
complicated and highly regulated. Genes 
are turned on or turned off, and the amount 
of gene product is attenuated (increased or 
decreased) depending on the needs of the 
cell. A gene coding for a transcription factor 
protein can in turn control the expression of 
many other genes through the function of 
the transcription factor. 


5) 
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individual's environment can impact the overall risk 
of a disease. Factors known to increase the risk for 
coronary disease include obesity, lack of exercise, and 
a family history of heart attacks. The risk of heart disease 
for any one individual depends on that person’s genes, 
the effects of genetic mutations, and the cumulative 
results of a lifetime of environmental exposure. 

Inheriting a mutant gene (or genes) is one way to 
acquire a genetic disease, but it is not the only way. 
A genetic disease can be caused by a gene mutation 
that was not inherited but instead arose as a result of a 
spontaneous mutation in the DNA sequence encoding 
the gene. Damage to a person's DNA can be caused 
by factors in the environment, such as exposure to sun- 
light, hazardous chemicals, and radiation (not including 
medical or dental x-rays) (see Chapter 9). Sun expo- 
sure causes damage to DNA because sun light is the 
primary environmental source of highly mutagenic UV 
(ultraviolet) radiation. If the damage to the DNA in the 
genome is not repaired, the mutant gene will potentially 
produce defective proteins in the cell. 

A spontaneous mutation will persist in the genome 
DNA of the original cell and in all cells derived from 
the original cell. However, because the spontaneous 
mutation did not alter the genome of a sex cell (egg or 
sperm), the mutation will not be passed on to the off- 
spring. The blood-clotting disorder hemophilia is usu- 
ally caused by inheriting mutant genes that encode 
defective blood-clotting factors. Surprisingly, 30% of the 
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FIGURE 10.5 


Man’s best friend sleeping on the job? These dogs with the sleep disorder narcolepsy are playing actively (top left) until sud- 


denly they reach a threshold of excitement and they collapse together in a tangled heap, unable to move a muscle (top right, bottom). Frozen 
in a state of cataplexy, the dogs are still and silent. A few seconds or minutes later, the dogs jump up, shake it off, and carry on as before, 


until the next exciting event, such as dinnertime! 


people diagnosed with hemophilia had acquired the 
disease due to spontaneous mutations in the clotting 
factor genes. 


A combination of genetic and environmental factors influ- 
ences who we are and who we become, from personality 
to body type. The environment also influences the impact 
of many inherited diseases, although no known vitamins 
or amount of exercise can avoid the most serious conse- 
quences of even a “simple” genetic mutation such as sickle 
cell anemia. 


A relatively small number of monogenic diseases are 
caused by mutations in single human genes, including 
Huntington’s disease, cystic fibrosis, alpha-1 antitrypsin 
deficiency, adenosine deaminase (ADA) deficiency, 
neurofibromatosis 1, phenylketonuria (PKU), severe 
combined immunodeficiency syndrome (SCID), and 
sickle cell disease. Most inherited diseases (and human 


traits) are multigenic and are caused by many genes. 
Multigenic disorders include heart disease, hypothy- 
roidism, colon cancer, other cancers, Alzheimer’s 
disease, and diabetes (Table 10.1). 


INCONSISTENT GENETIC TESTING LAWS 


The information obtained from identifying the mutant 
gene involved in a disease can often be used to 
develop a genetic test to detect people who might be 
at risk for a particular inherited disease. The develop- 
ment of gene-specific DNA probes has made it routine 
to test for a specific mutant or wildtype gene allele in 
an individual’s genome. Increased genetic testing has 
brought about the need for more consistent access to 
reliable genetic counseling. The genetic screening tests 
usually given to newborn infants in the hospital shortly 
after birth have proven to be an extremely effective 
way to avoid the consequences of inheriting certain 
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TABLE 10.1 Human diseases caused by single gene mutations 
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Monogenic diseases Description 


Symptoms 


Adenosine deaminase 


(ADA) deficiency severe infections 


Immune disease makes the body open to 


Growth retardation, opportunistic infections, poor/little 
immune system 


Alpha-1 antitrypsin 
deficiency 
emphysema and liver disease 


Lack of a liver protein that normally blocks 
destructive enzymes; deficiency leads to 


Shortness of breath with wheezing, unintended weight loss, 
fatigue, respiratory infections, rapid heartbeat on standing, 
vision problems 


Cystic fibrosis (CF) 
and other body organs 


Chronic illness affects lungs, digestive system, 


Very salty skin, shortness of breath, wheezing, coughing 
phlegm, frequent lung infections, poor growth and weight 
gain, difficulty with bowel movements 


Neurofibromatosis 
embryonic neural cells 


Progressive neurological disorder; abnormal 


Café-au-lait spots on the skin and freckles, neurofibromas, 
bone defects, bilateral acoustic tumors (type 2) 


Phenylketonuria (PKU) 
the amino acid phenylalanine 


Genetic disorder prevents body from utilizing 


Light skin color (relative to biological family), eczema, 
possible mental retardation 


Severe combined 
immunodeficiency 
syndrome (SCID) 


Immune deficiency; abnormal T and B 
lymphocytes 


Severe infections in first several months of life; pneumonia 
and meningitis 


devastating genetic diseases. Currently, these screen- 
ing tests identify 29 treatable genetic disorders using 
just a few drops of blood taken from a pin prick in a 
newborn's heel. 

The genetic screening tests are relatively inexpen- 
sive and are essential to identify newborns at risk. 
Testing immediately after birth assures that the babies 
will get early treatment to avoid lifelong disability or 
death. One of the earliest tests available detected the 
inherited metabolic disorder phenylketonuria (PKU), 
which causes severe mental retardation if not rapidly 
diagnosed; early treatment avoids permanent damage 
to the brain. The importance of these newborn screen- 
ing tests is clear, but progress toward guaranteeing test- 
ing for all infants born in the United States is very slow. 
Surprisingly, in 2005 only 38% of American infants 
were born in states that screen for at most 21 of the 29 
treatable genetic diseases. Many groups have recom- 
mended for years that states adopt federal guidelines 
requiring that every infant born in the United States be 
tested for the same 29 genetic disorders. Federal stand- 
ards are needed to correct the widespread inconsist- 
encies that exist between state-run testing programs 
across the country. 


GENETIC DISEASES ARE FREQUENTLY 
CAUSED BY MORE THAN ONE GENE 


Most human genetic diseases are caused by the actions 
of more than one gene, together with influence from 
various environmental factors. To find the multiple 
genes involved in a complex condition such as heart 


disease, it is necessary to conduct large genetic studies 
involving many people. The Wellcome Trust sponsored 
a large study testing DNA samples from 17,000 British 
residents, and processing the equivalent of almost 10 
billion pieces of genetic information. (The Wellcome 
Trust is the world’s largest medical research charity 
funding research into human and animal health.) The 
scientists used the most recent gene mapping tech- 
niques based on SNP markers (DNA polymorphisms) 
to successfully identify several new genes associ- 
ated with bipolar disorder, Crohn’s disease, coronary 
heart disease, hypertension, rheumatoid arthritis, and 
diabetes. The study is also pursuing the human genes 
involved in tuberculosis, breast cancer, autoimmune 
thyroid disease, multiple sclerosis, and ankylosing 
spondylitis (Table 10.2). 


A genetic mutation that alters the three-dimensional shape 
of the protein product prevents the mutant protein from 
functioning properly, often because the misshapen protein 
cannot fit together with other proteins to assemble a func- 
tional molecular machine. 


The human brain is the most amazing, intricate, 
and complicated organ in the body, so it makes sense 
that diseases affecting the brain are also very complex. 
Alzheimer’s disease often brings early memory loss 
and dementia even in middle-aged people. Named in 
the early 1900s for Alois Alzheimer, a German neurol- 
ogist, the disease causes a progressive loss of intellec- 
tual function, followed by the inability to speak, walk, 
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TABLE 10.2 Human diseases caused by mutations in more than one gene 


Multigenetic diseases Description 


Symptoms 


Heart disease 


Several diseases affect the heart and the blood vessels; 
coronary artery disease (CAD) and heart failure 


Chest pain, shortness of breath, fatigue; different 
symptoms for each heart disease in men and 
women 


Hypothyroidism Deficiency in thyroid hormone causes slow 


metabolism; at birth causes cretinism 


Dry skin, puffy face, hair loss, slow speech and 
heart rate, mental retardation 


Colon cancer 


and females 


Colon cancer can metastasize; early detection is 
essentials; a leading type of cancer in both males 


Fatigue, weakness, shortness of breath, change in 
bowel movements, abdominal pain, cramps, and 
bloating 


Alzheimer’s disease 


Diabetes mellitus 


type | and type II diabetes 


Impaired higher intellectual and cognitive brain function; 
progression to dementia over a 5- to 10-year period 


Chronic disease; very high glucose levels in the blood 
(hyperglycemia); insulin missing or not functional; 


Memory loss, confusion, hallucinations, emotional 
instability, inability to concentrate, perform daily 
activities and personal care 


Thirst, blurry vision, hunger, fatigue, dry itchy skin, 
weight loss, excess urination, tingling in the hands 
and/or feet, sores that take a long time to heal 


or perform even basic skills. The cause of Alzheimer’s 
disease was completely unknown until 1987 when 
researchers identified the first human gene linked to 
the development of the abnormal brain tissues that 
characterize the disorder (Figure 10.6). Brain tissues 
analyzed from people with Alzheimer’s disease typi- 
cally show fibrous plaques made by the accumula- 
tion of short amyloid proteins. Familial Alzheimer’s 
disease (FAD) is caused by a mutation in the gene 
located on human chromosome 21 that codes for the 
amyloid precursor protein (APP). The large APP protein 
is located in nerve cell membranes at the junctions 
between nerve cells (synapses). Abnormal cleavage of 
the precursor APP protein produces the short amyloid 
proteins that form the ubiquitous amyloid plaques in 
Alzheimer’s brains. The protein complex that cleaves 
APP requires the function of presenilin, a gene that 
when mutated has been identified as a major genetic 
risk factor for Alzheimer’s disease. 


Diseases like Alzheimer’s are especially hard to study 
because diseases of the brain are often caused by defects 
in the complicated interplay of the many genes involved in 
the specialized development and function of nerve cells. 


Amyotrophic lateral sclerosis (ALS) is commonly 
known as Lou Gehrig’s disease in honor of the New 
York Yankees first baseman who died from ALS in 1941 
(Figure 10.7). ALS is a progressive disease that destroys 
motor nerves in the brain and the spinal cord, causing 
muscle weakness, paralysis, and death. Surprisingly, 


FIGURE 10.6 Plaques and tangles form in the brains of Alzheimer’s 
patients. Alois Alzheimer studied the brain tissue and identified 
the characteristic changes that are now known to be hallmarks of 
Alzheimer’s disease. 


only 5% to 10% of ALS cases are inherited; the 
remaining cases result from spontaneous mutations. 
An international team of scientists identified a mutant 
form of the superoxide dismutase (SOD1) gene by 
studying people with ALS from 13 families. In healthy 
cells, the SOD1 enzyme eliminates toxic free radicals 
from the cells, an important function because free rad- 
icals attack and mutate DNA and proteins. The defec- 
tive SOD1 enzyme also fails to control the function of 
an important transporter protein that normally removes 
excess glutamate from nerve cells. The accumulation 
of high levels of glutamate is especially toxic to nerve 
cells, so the brain relies on the SOD1 enzyme to elimi- 
nate both free radicals and glutamate from nerve cells. 
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FIGURE 10.7 Lou Gehrig, renowned New York Yankees first base- 
man. Lou Gehrig died of ALS in 1941. 


Many healthy people find it difficult to imagine liv- 
ing every day with the challenges of a chronic, disa- 
bling disease like ALS. Stephen Hawking is a man who 
has accomplished more than most people in his lifetime 
of 60 or so years. Born in England, where he lives and 
works as an award-winning astrophysicist, Hawking 
has three children and one grandchild. In addition to 
teaching and traveling worldwide giving seminars on 
his work, Hawking wrote several books including the 
all-time best-selling book A Brief History of Time. And 
Stephen Hawking has ALS disease (Figure 10.8). 

On his web site, in plain and often humorous lan- 
guage, Stephen Hawking answers many of the questions 
that people have about living with a serious disease. 
Hawking describes his childhood and his education in 
England. He explains how he was diagnosed with ALS, 
and he discusses finding a place to live and a job. Also 
in plain language, Stephen Hawking describes his stel- 
lar research that changed our understanding of the basic 
laws of the universe, beginning with the Big Bang and 
ending in what Hawking calls, the “not-quite-black” 
holes. About ALS, Hawking says, “I am quite often 
asked: How do you feel about having ALS? The answer 
is, not a lot. I try to lead as normal a life as possible, 
and not think about my condition, or regret the things it 
prevents me from doing, which are not that many.” 
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FIGURE 10.8 Astrophysicist Stephen Hawking. This amazing scien- 
tist, author, and father lives with the devastating disease, ALS. 


Free radicals damage DNA and proteins, but a diet rich 
in colored vegetables is the best source of antioxidants to 
fight the free radicals in your cells. Recent studies show 
that the dietary supplements marketed as antioxidants do 
not work. 


Diabetes and Obesity Epidemics in the 
United States 


Type | and type Il diabetes are complicated serious 
metabolic disorders affecting animals and humans. 
Interactions among at least 15 gene mutations have 
been linked to type | diabetes in humans. The charac- 
teristic feature of type | diabetes is the failure of the 
pancreas to produce insulin, a hormone protein, which 
is required for cells to take up sugar from the blood 
for use in energy metabolism (Figure 10.9). People 
with type I (or juvenile-onset) diabetes often become 
insulin-dependent as children, requiring daily injec- 
tions of animal-purified or recombinant DNA-produced 
insulin (Humulin) to survive. People who develop type II 
diabetes (also called adult, maturity, or late-onset dia- 
betes) are often overweight adults with sedentary life- 
styles, both important risk factors for type II diabetes. 
Often type II diabetics can rely on oral medications to 
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control their blood sugar levels, but neither oral drugs 
nor insulin injections are a cure for type | or type II 
diabetes. These treatments delay the physiological 
damage to the body that occurs when blood sugar lev- 
els are not controlled. For example, the tiny capillary 
blood vessels and nerves in the diabetic’s hands and 
feet and the retina in the eyes are particularly sensitive 
to the detrimental effects of high blood sugar levels, 
making blindness and loss of feeling in the extremities 
just two of the many complications of this disease. 


The Diabetes epidemic is international. Diabetes (type | 
and type Il) will kill about 3.8 million people worldwide 
in 2007, the same number killed by AIDS. In the next 
20 years, diabetes is projected to increase by 80% in Africa, 
100% in Latin America, and by 43% in the United States. 


For many people, obesity is an intractable medical 
problem that significantly increases the risk of having 
many medical problems in addition to diabetes, includ- 
ing high blood pressure, heart disease, and breathing 
problems. In some cases, excess body weight is inher- 
ited and has a genetic and biochemical basis. Obesity 
is increasing among U.S. children, accompanied by 
a staggering rise in the number of children diagnosed 
with type II diabetes. 

Scientists study the role of genes in weight control 
and obesity in humans using mice. In the early 1990s 
a new gene was discovered that causes a severe form of 
inherited obesity when mutated in mice (ob gene) (Figure 
10.10). Similar protein hormones were found in both the 
mouse and human cells called leptin. Studies show that 
the leptin hormone is made only in the fat cells of the 
mouse that functions in weight control and helps 
the mouse to maintain its ideal body weight. Defects in 
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4 FIGURE 10.9 Insulin is a protein 


hormone used to treat diabetes. 
Insulin is made in the pancreas as 
1 
S-S 
10 s 
s” 


a precursor protein containing three 
amino acid chains synthesized as 
one protein. In the blood, the insu- 
lin precursor protein is processed, 
which removes the C peptide and 
produces the active insulin hormone 
containing the A and B peptides, 
linked together by these disulfide 
bonds (-S-S-). 
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FIGURE 10.10 The ob mouse gene encodes a weight control 
protein. The obesity gene (ob gene) in mice codes for a hormone 
protein called leptin. The mouse carrying a mutant ob gene gains 
weight (left), while the mouse with the normal ob gene makes func- 
tional leptin proteins and exhibits appropriate weight control. 


the leptin protein cause a severe form of inherited obes- 
ity in mice and humans, another indication that mice 
and man use similar proteins and biochemical processes 
to control body weight. 

The leptin hormone is secreted into the blood by 
the fat cells and delivers a signal to the hypothalamus, 
the part of the brain that regulates food intake and 
energy expenditure in both the mouse and the human. 
Leptin has enormous potential as a drug if it can safely 
and effectively promote weight control in humans. 


The large number of obese children diagnosed with dia- 
betes in the United States is now recognized as a public 
health emergency. The direct solution is to take immediate 
action—decrease caloric intake in the diet and increase 
body activity. 
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HUMAN CHROMOSOME KARYOTYPES 
REVEAL GENETIC DISEASES 


The analysis of whole chromosomes plays a critical role 
in diagnosing genetic diseases and is equally important 
in routine prenatal genetic testing. Normally each cell in 
the human body contains 46 chromosomes in the 
nucleus. When a cell divides to make two cells, each 
new cell also contains 46 human chromosomes. The 
exceptions are the sex cells. The sperm and egg cells 
each contain 23 chromosomes, half of the 46 chromo- 
somes found in every body cell. This difference in chro- 
mosome number is extremely important because when 
the sperm and egg cells come together to make a ferti- 
lized egg, the egg receives 23 chromosomes from the 
mother (unfertilized egg) and 23 chromosomes from 
the father (sperm) for a total of 46 chromosomes (see 
Chapter 6). An analysis that reveals the number and 
structures of the human chromosomes can determine 
the gender of an individual as well as diagnose certain 
genetic disorders. 

Human chromosomes dramatically change struc- 
ture as the cell proceeds through the different stages of 
the cell cycle (see Chapter 9). Cells that are in different 
stages of mitosis can be identified by staining the cells 
that are involved in cell division. In preparation for 
mitosis, the chromosomes condense as the cell begins 
to divide into two cells (see Chapter 9). After DNA rep- 
lication is finished, each chromosome contains two 
duplicated DNA molecules that each extend from one 
end of a chromatid to the other end of the same chro- 
matid. In order to be able to physically move the long 
strands of duplicated DNA, the DNA helix becomes 
tightly packaged with special histone proteins into 
compact structures called mitotic chromosomes 
(Figure 10.11, A, B, and C). Once the chromosomes 
have been distributed to the offspring cells, cytokinesis 
occurs and the two new cells physically separate. The 
mitotic spindle apparatus is then disassembled until 
the next cell division cycle. The compact chromo- 
somes decondense and become threadlike chromatin 
fibers, completing the chromosome cycle. 


Cells must protect, duplicate, and segregate their genomes 
during each cell cycle. The cell starts by duplicating its 
chromosome DNA and then assembles a spindle fiber 
apparatus that attaches to the chromosomes, and moving 
half of the chromosomes into each of the progeny cells 
to ensure that each cell receives the proper number of 
chromosomes. 


Human mitotic chromosomes are prepared for 
morphological studies by cytological staining meth- 
ods so that they are visible when magnified in a light 
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microscope (Figure 10.12). The condensed chromo- 
somes are taken from cells in mitosis and are stained 
with Giemsa dye to create a distinct banding pattern 
on each chromosome (see Chapter 9). The 46 chromo- 
somes from individual cells were sorted, identified, and 
arranged in a typical human karyotype display in black 
and white. Currently, amazing spectral technology 
allows scientists to “paint” human mitotic chromo- 
somes in a rainbow of colors and generate biologi- 
cally accurate computer images of karyotypes with 
multicolored chromosomes (Figure 10.13). To make 
a spectral image, the researchers treat the chromo- 
somes with several different fluorescent tags attached 
to different DNA probes that base pair to different 
sequences on specific chromosomes. The light emit- 
ted from the bound probes is collected and sent to a 
camera to create the digital image. Although several of 
the fluorescent dyes appear to be the same color to the 
naked eye, subtle differences in the wavelengths of the 
dyes can be detected by an interferometer, which uses 
the interference of light waves to determine distance 
(wavelength). The computer then assigns colors to the 
different chromosomes to display the digital “painted” 
karyotype. These spectral karyotypes are not only beau- 
tiful, but they also offer scientists a much better look 
at chromosome structure at a higher resolution than is 
possible using a standard karyotype test (Figure 10.14). 


Extra Chromosome Copies Upset Gene 
Balance 


Genetic destiny is set at conception when the sperm 
and egg both contribute chromosomes. Some genetic 
disorders are caused by inheriting an extra copy of 
an entire chromosome, a condition called trisomy. 
Inheriting three copies of one chromosome and genes 
(usually present in only two copies) can only be toler- 
ated in rare cases such as trisomy 21, Down syndrome 
(Figure 10.15). However, depending on which chromo- 
some is involved, inheriting an extra chromosome copy, 
or losing a chromosome copy, can be a lethal event. 
Down syndrome is the most common chromosome 
imbalance in humans (1 in 800 live births), affecting 
more than 350,000 people in the United States. Down 
syndrome occurs when an individual inherits an extra 
copy of chromosome 21 (trisomy 21), which causes 
mental retardation, distinctive facial features like slanted 
eyes, and heart problems. In 90% of the Down syn- 
drome cases, the extra chromosome copy was inherited 
from the mother’s egg, explaining the observation that 
the incidence of Down syndrome increases with the 
increasing age of the biological mother. Chromosome 
mis-segregation events are much more frequent as we 
age, and the eggs produced in older women are more 
likely to have a chromosome imbalance even before 
fertilization occurs. 
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FIGURE 10.11 Mitosis is the process of cell division. (A) Chromosomes in cells in different stages of mitosis. Chromosomes change shape and 


position during the different stages of mitosis in the cell cycle (mitosis). Tubulin proteins in fibers in the cells are stained green by antitubulin 
antibodies; the chromosomes are stained red. (B) Chromosomes line up in metaphase before they segregate. Chromosomes are moved (segre- 
gate) using spindle fibers that attach to the centromere DNA in each chromosome. Accurate chromosome segregation ensures that each daugh- 
ter cell will inherit the correct number of chromosomes. (C) Condensed mitotic chromosomes are narrow at the centromere. Each chromatid 
contains one linear (unbranched) double-stranded DNA molecule extending from one end to the other end. In preparation for mitosis, the 
chromatin fibers are packed into condensed chromosomes. Shown are scanning electron micrographs of the human X and Y chromosomes. 


The power of the sex chromosomes to influence trait 
development is demonstrated by people who inherit an 
imbalance of the X or Y chromosomes, normally X, X 
for human females and X, Y for males. A female who 
inherits one X chromosome instead of two X chromo- 
somes has Turner syndrome, which affects more than 
60,000 women in the United States. Individuals who 
inherit one Y chromosome and two X chromosomes 
have Klinefelter syndrome (1 in 500 to 1000 live 
births); they develop as males but lack male secondary 
sex characteristics (facial, underarm, or pubic hair). 


Mixed-Up Chromosome Sequences Cause 
Genetic Disorders and Cancer 


Human chromosome translocations involve breaking 
the DNA helix in one chromosome and attaching it to 
a different DNA helix in a completely different chro- 
mosome. A reciprocal translocation involves an even 
swap of DNA sequences between two chromosomes, 
and occurs without the loss of any genetic informa- 
tion (Figure 10.16). This is because genes located 
completely within the boundaries of the translocated 
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FIGURE 10.12 Human chromosomes have distinct banding patterns. (A) The drawing shows the banding patterns that are characteristic of 
human chromosomes. (B) This typical karyotype display shows 46 chromosomes (22 pairs plus X, Y) from a male human. 


FIGURE 10.13 


region of DNA are not affected by moving the DNA to 
a different chromosome. However, the genes encoded 
by DNA sequences that bridge the chromosome junc- 
tions will be disrupted by the DNA translocation event, 
sometimes with devastating consequences. A good 
example is chronic myelogenous leukemia (CML), 
a type of blood cancer that causes the uncontrolled 
growth of white blood cells (leukocytes). About 5000 
people are diagnosed with CML each year, and more 
than 20,000 people in the United States live with the 
disease every day. 

CML begins to develop when a chromosome translo- 
cation occurs in a single blood stem cell in the bone mar- 
row (Figure 10.17A). In the first step of the translocation, 


“Painted” human chromosomes make a spectral karyotype. (A) Human chromosomes are “painted” with different DNA- 
specific fluorescent dyes that allow each chromosome and different regions on each chromosome to be identified by color (left: true color; 
right: pseudo color). (B) Mitotic chromosomes from a human male are arranged into a spectral karyotype, emphasizing the similar colored 
banding patterns exhibited by homologous chromosomes from the mother and father. 


the DNA helices in chromosomes 9 and 22 are broken, 
within disrupting the ABL and BCR genes. In the next step, 
the DNA in chromosome 9 becomes attached to the end 
of chromosome 22, which effectively fuses the sequences 
encoding the beginning of the BCR gene to the end of 
the ABL gene, and creating a new fusion gene, BCR-ABL, 
which is carried on the Philadelphia chromosome, and is 
easily identified by karyotype analysis (Figure 10.1 7B). 

In the bone marrow, the single leukocyte stem cell 
carrying the Philadelphia chromosome expresses the 
BCR-ABL fusion gene, which acts as an oncogene 
and promotes the development of cancer. The BCR- 
ABL fusion protein functions as a kinase enzyme that 
attaches phosphate groups to target proteins in the cell. 
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FIGURE 10.14 Colored human chromosomes are visible in a spec- 
tral array (top). The colors exhibited by each chromosome can result 
from combinations of different fluorescent colors (bottom). 
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FIGURE 10.15 Colored human chromosomes are visible in a spec- 
tral array (top). The colors exhibited by each chromosome can result 
from combinations of different fluorescent colors (bottom). 


The proteins with the phosphates become permanently 
active, even though they normally function only when 
the cell receives a signal to start cell division. The mutant 
BCR-ABL enzyme not only promotes rapid cell division, 
but it also protects the new cancer cells from committing 
suicide by programmed cell death (apoptosis). 

At this stage the CML disease is chronic. The cancer 
cells are still subject to partial cell cycle control, and 
occasionally they differentiate into mature cells that per- 
form normal functions in the blood. However, the cancer 
cell carrying the Philadelphia chromosome can acquire 
a second mutation that activates another oncogene (ras) 
or destroys the function of the p53 tumor-suppressor 
gene, resulting in a large increase in reproduction of the 
cancer cells. Cells that carry both mutations fail to differ- 
entiate and cannot function properly in the blood, bring- 
ing on the crisis phase of CML disease. 
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FIGURE 10.16 Reciprocal chromosome translocation. Reciprocal 
translocation of DNA between chromosomes 4 and 20 occurs with- 
out the loss of essential genes. 


A Robertsonian translocation occurs when the long 
arms of two chromosomes break off and fuse together 
at a centromere, causing the loss of the genetic material 
on the two short arms of the chromosome. Surprisingly, 
there are no obvious negative consequences for peo- 
ple who inherit Robertsonian translocations involv- 
ing chromosomes 13, 14, 15, 21, or 22, even though 
all the genes on the short chromosome arms are 
lost. Karyotype analysis revealed that people with 
Robertsonian translocations have only 45 chromosomes 
in each cell (instead of 46), yet these individuals func- 
tion normally. However, the biological children of peo- 
ple with Robertsonian translocations are at high risk of 
inheriting a chromosome imbalance, the wrong number 
of chromosomes per cell. 


COMPARISON OF HUMAN GENOMES 
REVEALS IMPORTANT DNA DIFFERENCES 


The ability to make connections between certain human 
genes and specific human diseases yields important 
information that often accelerates the development 
of new medical treatments. It is accurate to say that 
we inherit two copies of each chromosome, one from 
our mother and one from our father, but the two chro- 
mosome copies are not completely identical in DNA 
sequence. Studies that compared the genome DNA 
sequences from many different people reveal that indi- 
vidual human genomes have single base pair differ- 
ences located at many sites along the genome DNAs. By 
2007, scientists had identified and cataloged more than 
1.8 million single nucleotide polymorphisms (SNPs) on 
the 46 human chromosomes. 
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FIGURE 10.17 A chromosome translocation causes a human blood cancer. (A) The Philadelphia chromosome results from translocation events 
that attach part of chromosome 22 onto the long arm of chromosome 10 and move a small part of the short arm of chromosome 10 onto chromo- 
some 22. This chromosome DNA rearrangement creates a fused gene that acts as an oncogene and causes a blood cancer called chronic myelog- 
enous leukemia (CML). The Philadelphia chromosome 10, 22 (Ph1) carries a new gene fusion (ABL-BCR) that functions as an oncogene and causes 
cancer; (B) Philadelphia chromosome karyotype. The reciprocal translocation between chromosome 10 and chromosome 22, designated t(10;22), 
results in one chromosome 10 that is longer than normal and one chromosome 22 (Philadelphia chromosome: Ph1) that is shorter than normal. 


Before the human genome sequence became avail- 
able, genes were identified and mapped on the chro- 
mosomes using genetic information in combination 
with chromosome markers, usually restriction fragment 
length polymorphisms (RFLP) (see Chapters 6 and 8) 
(Figure 10.18). Finding human genes by RFLP map- 
ping is much more difficult than using SNPs because 
the RFLP markers occur much less frequently in the 
human genome than SNPs, and therefore provide 
much less information for the gene search. The RFLP 
and SNP markers can be used to identify and locate 
genes on human chromosome because the positions of 
many RFLP and SNP markers are known in the human 
genome DNA. The RFLP and SNP markers are used to 
identify which chromosomes and genes are reproduci- 
bly inherited by individuals who also inherit the particu- 
lar disease under study. In other words, if a chromosome 
marker is inherited by the same family members who 
also inherit a genetic disease, then a possible link exists 
between that chromosome marker (or gene mutation) 
and a specific genetic disease. When an SNP marker 
and a particular gene are inherited together generation 
after generation, then the SNP marker and the gene are 
considered to be genetically linked; the marker and the 
gene are located on the same chromosome within a 
region containing about 5 million base pairs of DNA. 


Family Connections Are Important to the 
Search for Disease Genes 


This chapter describes the genetic mapping meth- 
ods used to identify genes involved in human genetic 


diseases. Behind the scenes in these studies are the 
people and their families who live every day with 
genetic diseases and disorders. Several groundbreak- 
ing genetic studies would not have been possible with- 
out the willing cooperation of many devoted patients 
and their families who continue to work to promote 
research to develop future treatments. Gene mapping 
studies often involve making a pedigree, a kind of 
biological family tree that illustrates a pattern of gene 
inheritance and the diseases exhibited by the family 
under study. This approach was used to identify many 
human genes including the gene associated with a 
devastating neurological disorder called Huntington’s 
disease, named for George Huntington, the American 
physician who studied the disease in the early 1900s. 
Probably best known as the disease that killed folk- 
singer Woody Guthrie in 1967, Huntington’s disease 
is an invariably fatal illness that slowly destroys brain 
function (Figure 10.19). People with Huntington’s dis- 
ease suffer progressive deterioration of the nervous 
system, starting with minor neurological symptoms but 
inevitably leading to involuntary thrashing and loss of 
motor control. These writhing movements earned the 
disease the name of Huntington’s “chorea,” from the 
Greek word for dance, chorea. 


Mapping human genes depends on studying the incidence 
of the disease in members of a biological family because 
the pattern of gene inheritance will reflect any genetic 
link between a specific mutant gene and the disease in 
question. 
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FIGURE 10.18 Detecting an RFLP in genome DNA. The first individual has a region of DNA with three cleavage sites for the Hindlll restric- 
tion enzyme extending over about 6 (kilobase pairs) kb (or 6000 bp) in the genome. If we cut just that region of the DNA with Hindlll, the result 
would be two DNA fragments 4kb and 2kb in length, which add up to 6kb in total. Analysis of the same region of DNA in the genome of a 
second individual shows that one of the three HindIII sites has been altered by a mutation in the second genome and as a result the DNA at that 
site is no longer recognized or cut by the Hindlll restriction enzyme. If we cut just that region of the DNA with HindIII, the result would be one 
DNA fragment that is 6kb in length. The DNA probe in this experiment allows the scientist to visualize only the desired DNA bands without 
interference from all of the other DNA fragments generated by cutting all 46 chromosomes with HindIII. 
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FIGURE 10.20 The Gusella research group worked on Huntington’s 
disease. 


In the 1970s, geneticist James Gusella at Harvard 
University and neuropsychologist Nancy Wexler at 
Columbia University and colleagues began to look for 
the human Huntington’s disease gene, a quest that led 
them to Venezuela (Figure 10.20). They found a fam- 
ily descended from a European woman who migrated 
FIGURE 10.19 Folk singer Woody Guthrie died from Huntingtons tO Venezuela in the early 1800s and brought with her 
disease. the mutant gene that causes Huntington’s disease. 
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The research team led by Nancy Wexler traveled to 
Venezuela to collect information and biological sam- 
ples from the villagers, which were then taken to the 
Gusella lab at Harvard for analysis. 

When this study began, the chromosomal location 
of the Huntington’s gene was not known, so the first 
goal of the research team was to find the specific DNA 
probe that could reproducibly detect the Huntington’s 
gene on human chromosomes. This study predated 
SNP gene mapping by many years, but at the time 
Gusella’s team had several human RFLP DNA probes 
available to test the dozens of DNA samples gathered 
from the people in Venezuela. One by one the team 
screened the DNA samples from Venezuela with each 
RFLP DNA probe to search for a link between the 
family members who inherit the mutant gene and the 
incidence of Huntington’s disease in the Venezuelan 
family. After the first several RFLP candidate probes 
failed the linkage test, Gusella decided that the next 
candidate probe would be named G8 after his lab 
technician, Ginger Weeks. The G8 DNA probe worked 
so well that it was used to accurately track the mutant 
Huntington gene through seven generations of the 
Venezuelan family! 

These experiments finally revealed the chromosome 
location of the Huntington’s disease gene, but it wasn’t 
until 1993 that an international team of research scien- 
tists including Gusella’s group announced that they had 
finally isolated (cloned) and sequenced the Huntington’s 
disease gene. They found that the wildtype Huntington’s 
gene encodes the huntingtin protein, which normally 
functions in nerve cells in the brain. However, despite 
the successful identification and cloning of the gene 
for the huntingtin protein, over a decade later there 
was still no effective treatment or cure for people who 
inherit this devastating disease. Nancy Wexler returns 
frequently to Venezuela to add more data to the grow- 
ing pedigree chart of the family with Huntington’s dis- 
ease that covers both walls of the hallway outside her 
office. This work holds a personal interest for Wexler 
who is also at risk for inheriting the mutant Huntington’s 
disease gene. More about the Huntington's disease gene 
is presented later in this chapter. 


CARRIERS OF GENETIC DISEASES 
HAVE A MUTANT GENE BUT DO NOT 
GET SICK 


Every year more than 30,000 children in the United 
States become chronically ill with a persistent cough, 
constant lung infections, shortness of breath, and poor 
growth because they have inherited cystic fibrosis. There 
is no cure for cystic fibrosis disease, but due to advances 
in medical treatments many people with cystic fibrosis 
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now live well into their forties (median age 37 years). 
The gene that causes cystic fibrosis disease is called the 
cystic fibrosis transmembrane conductance regulator 
(CFTR) gene (Figure 10.22A). The wildtype CFTR gene 
is written in all capital letters and the mutant gene allele 
written in lowercase letters (cftr). To develop the symp- 
toms of cystic fibrosis disease, a person must inherit two 
copies of a mutant cystic fibrosis gene (cftr/cftr). 

Healthy human cells must constantly regulate the 
internal concentrations of salts and many other poten- 
tially harmful compounds. As you might guess from 
the name, cystic fibrosis transmembrane conductance 
regulator, the CFTR gene codes for a key protein that 
builds channels in the cell membrane to export excess 
salt from the cell. In people with cystic fibrosis disease, 
the defective proteins cannot build functional mem- 
brane channels and as a result the cells fail to properly 
regulate internal salt concentrations. 

More than 10 million Americans carry one wildtype 
copy (CFTR) and one mutant (cftr) copy of the cystic 
fibrosis gene and are genetic carriers of cystic fibrosis 
disease. However, people who are carriers of cystic 
fibrosis do not suffer from disease symptoms them- 
selves, but although they are at risk of passing the 
mutant cftr gene to their children. A child who inher- 
its a mutant cftr allele from each parent (cftr/cftr) will 
become sick with cystic fibrosis disease because the 
child’s cells lack the normal gene and can make only 
the defective cftr proteins. People who are carriers of 
cystic fibrosis disease do not become sick because the 
normal (CFTR) and mutant (cftr) forms of the CF pro- 
teins are both made in the cells of a CF carrier. As a 
result, the normal CFTR proteins can build functional 
membrane channels and properly regulate internal salt 
concentrations, even in the presence of mutant, non- 
functional cftr proteins (Figure 10.22B). In genetic ter- 
minology, the wildtype CFTR allele is dominant over 
the recessive mutant cftr allele. 


Carriers of cystic fibrosis do not become sick because their 
cells carry both the mutant and normal alleles of the CFTR 
gene and make both the mutant and normal forms of the 
proteins; the wildtype CFTR proteins are sufficient to com- 
pensate for the lack of function by the defective cftr proteins. 


Muscular dystrophies are not a single disease called 
Muscular Dystrophy but instead represent a group of 30 
genetic diseases that cause progressive degeneration of 
the muscles that control body movements. Duchenne 
muscular dystrophy (DMD) is the most prevalent type of 
muscular dystrophy disease in humans and was named 
for Guillaume Duchenne, the French neurologist who 
described the condition in 1868. DMD causes the rapid 
degeneration of muscle tissues early in life, striking 
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Box 10.1 Get to Know Nancy Wexler: Huntington Gene Hunter 


If you are a baby boomer, you might remember seeing 
Nancy Wexler years ago when she appeared on television 
on 60 Minutes in “Gene Hunter,” a show that described 
Nancy's search for the gene that causes Huntington's disease. 
It took 10 years of research and the assistance of the entire 
Venezuelan village of Laguneta, but finally, in 1983, Nancy 
Wexler and her colleagues discovered the Huntington’s dis- 
ease gene (Figure 10.21). 

What led Nancy Wexler to become interested in searching 
for the genes that cause genetic diseases? Growing up, Nancy 
developed a strong interest in science, especially psychology. 
She attended Harvard University’s Radcliffe College, where 
she studied literature, anthropology, and psychology. Later, 
Nancy studied in London under Anna Freud, the daughter 
of the famous Dr. Sigmund Freud. 

Huntington’s disease has been a part of Nancy's life as 
long as she can remember. Nancy’s three uncles died from 
Huntington’s disease and her mother, who also developed 
this devastating disease, died when Nancy was 32. Nancy still 
faces this personal challenge; she and her older sister Alice 
are at risk of inheriting the disease. In 1969 when Nancy first 
began studying Huntington's disease, there was no way to 
know which children born to parents with Huntington's dis- 
ease might inherit the gene that causes the disease. Nancy 
joined other scientists dedicated to finding the gene that 
causes Huntington disease, with the hope that finding the 
gene might lead to effective treatments and eventually a cure. 

Nancy Wexler led a team of scientists who traveled to 
Venezuela to collect dozens of biological samples from peo- 
ple in the village of Laguneta, people who potentially carry 
the Huntington’s disease gene. The team kept careful records 
on the people who donated the samples, including medical 
histories and exams, gathering information that was criti- 
cally important for the scientists to correlate candidate genes 
with the occurrence of Huntington’s disease. The dedicated 
efforts of many people finally paid off when the chromosomal 


children at the age of 3 years, gradually causing unsteady 
movements as they lose muscle strength and control. 
Often confined to a wheelchair by age 10, children 
with DMD usually die in their teens or early twenties. 
There is no cure for DMD. 

Scientists have known for years that people with 
DMD make mutant dystrophin proteins, so hopes for 
a cure were raised when the human dystrophin gene 
was first identified in 1986. Unfortunately, DMD is a 
good example of why identifying the gene responsi- 
ble for a genetic disease is not always enough to lead 
to a rapid cure for the disease. One reason is that the 
dystrophin protein is encoded by the largest known 
human gene, which extends over more than 2.6Mb 
(megabases) (2.6 million bases) of human genome 
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FIGURE 10.21 Nancy Wexler Gene Hunter helped to identify and 
isolate the Huntington’s disease gene. 


location of the mutant Huntington’s gene was finally deter- 
mined in 1983 using the genetic studies and biological sam- 
ples provided by Nancy Wexler, her team of scientists, and 
the people of the Venezuelan village. 

Unfortunately there is as yet no effective treatment 
or cure for this devastating disease, even though researchers 
were able to isolate and characterize the Huntington’s dis- 
ease gene. Now, because of scientists like Nancy, people at 
risk for Huntington’s disease can take a blood test to find out 
if they have inherited the mutant gene. Whether or not to 
take this test is a difficult decision for the person at risk to 
make, because a positive test result means you will develop a 
terrible disease with no treatment and no cure. What would 
you do? 


DNA and contains 97 exon sequences (Figure 10.23). 
Different mutations in the dystrophin gene are respon- 
sible for either the most severe form of the disease, 
Duchenne muscular dystrophy, or a much milder 
form of the disease, Becker muscular dystrophy. At 
first scientists could not explain why the mutation that 
removed the largest amount of DNA actually causes 
the mild disease, while, of DNA a mutation removing 
a much smaller region causes the severe form of the 
disease (Figure 10.24A). Further study of the expres- 
sion of the mutant alleles revealed that the mutant 
dystrophin RNA transcripts were processed by RNA 
splicing to make different mutant mRNAs, containing 
the appropriate coding sequences to be translated into 
the two mutant dystrophin proteins associated with the 


Chapter | 10 Human Genetic Diseases 


Prespliced RNA 
HHHH 2 


| 


Spliced MRNA 


Hr Membrane 4) 


CFTR channel protein 
(A) (B) 


FIGURE 10.22 


Father CF carrier 


235 


Mother CF carrier 
cftr/CFTR 


ba 


Child 
CFTR/CFTR 


CFTR/cftr 


Child 
cftr/cftr 


Child 
CFTR/cftr 


Child 
cftr/CFTR 


(A) Mutation in CFTR gene causes cystic fibrosis. (1) The normal cystic fibrosis gene CFTR is on human chromosome 7. 


(2) CFTR gene DNA is copied into prespliced RNA. (3) CFTR prespliced RNA is spliced into mRNA. (4) mRNA is translated into the CFTR 
protein that weaves back and forth through the cell membrane to build the CFTR protein channel needed to regulate the amount of salt inside 
the cell. (B) Genetic carriers of cystic fibrosis inherit one mutant cftr gene. Cystic fibrosis is caused by a recessive mutation in the CFTR gene; 
it is recessive because two copies of the mutant cftr gene must be present to cause the disease to develop (cftr/cftr). The mother (CFTR/cftr) 
and father (cftr/CFTR) are both carriers of cystic fibrosis but are normal because each carries a normal dominant CFTR gene to make up for 
the single copy of the mutant cftr gene. The children inherit genes in pairs, one copy from each parent. There are four possible gene outcomes 
for the children of these parents. Three children are normal because they carry at least one copy of the dominant CFTR gene. Two of these 
three children are carriers because they carry one normal and one mutant copy of the gene (CFTR/cftr). One child inherits two copies of the 
mutant gene (cftr/cftr) and becomes ill with cystic fibrosis. Key: CFTR (wildtype gene); cftr (mutant gene); CFTR/CFTR (no cystic fibrosis); 


cftr/cftr (cystic fibrosis disease); CFTR/cftr (CF carrier; no symptoms). 


mild and severe forms of muscular dystrophy disease 
(Figure 10.24B). 


Most human genes are copied into precursor RNAs con- 
taining both introns and exons. Successful RNA processing 
(splicing) removes the introns from each precursor RNA, 
and links the exons together, in the correct order, to pro- 
duce a mature mRNA containing just exon protein coding 
regions. 


DMD, the most common form of childhood mus- 
cular dystrophy, affects 1 in 3500 males worldwide but 
rarely affects females. Why? The answer lies in the fact 
that dystrophin is an X-linked gene; it is located on the 
X chromosome. Human males inherit one X and one 
Y chromosome, whereas females inherit two X chro- 
mosomes. In the case of X-linked genetic mutations, 
females often have an advantage over males. A female 
who inherits a mutant gene carried on one copy of the 
X chromosome has a 50% chance of also inheriting a 


normal (wildtype) version of that gene on her second 
copy of the X chromosome; one normal copy of the 
gene might be sufficient to compensate for the defi- 
cit caused by the mutant gene. Of course, she has an 
equal chance of inheriting two copies of the X chro- 
mosome carrying the mutant genes; lack of a wildtype 
allele in XY males means that the detrimental conse- 
quences of the mutant genes cannot be rescued. 

Human males inherit only one copy of the X chro- 
mosome, so they carry only one version of each gene 
on the X chromosome, either mutant or wildtype. As a 
consequence of human genetics, when a male inher- 
its an X-linked disease such as DMD, there are no 
backup copies of the X chromosome available in his 
male cells, which could potentially provide a normal 
version of the dystrophin gene and protein. For this 
reason, boys with the X-linked disease cannot make 
normal dystrophin proteins. 

Having no second, backup copy of the X chromo- 
some means that disease genes located on the X chro- 
mosome (X linked) will always predominantly affect 
boys over girls. 
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FIGURE 10.23 The dystrophin gene has 97 exons. It is transcribed into precursor RNAs, which are spliced to remove the introns. The numerous 
exons are retained in the spliced mRNA. Dystrophin is a rod-shaped protein containing 3684 amino acids. 
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FIGURE 10.24 (A) Dystrophin protein connects a complex in muscle membrane to actin filaments. The dystrophin protein connects the actin 
filaments to a complex anchored in the muscle membrane. One end of the normal dystrophin protein binds to the actin filaments, and the other 
end attaches to the sarcolemma membrane proteins. (B) Dystrophin mutations cause the molecular machine in the muscles to malfunction. 
Mutations that shorten the dystrophin helix cause the milder form, Becker muscular dystrophy, because the mutant protein is not long enough to 
reach from the actin filaments to the sarcolemma membrane. Mutations that alter the actin-binding region of the dystrophin protein prevent the 
complex from becoming anchored to the actin filaments, which causes the most severe form, Duchenne muscular dystrophy. 
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Triplet Repeat Mutations Cause Many 
Different Genetic Diseases 


The triplet repeat diseases (TRD) are grouped together 
even though they involve a variety of different genes 
and diseases. This is because the mutant forms of the 
TRD genes all have the same type of unusual DNA 
mutation; triplets of three DNA bases (e.g., CCG), are 
repeated in tandem often hundreds of times in the cod- 
ing region of the mutant gene (e.g., CCG CCG CCG 
CCG CCG). The normal (wildtype) allele of the gene 
usually encodes only a few triplet repeats, but in the 
TRD mutants the few triplet repeats have expanded 
to create a mutant gene containing very large num- 
bers of tandem triplet repeats. In most cases, the TRD 
mutation is located in the coding region of the gene, 
which means that the mutant triple repeat sequence is 
translated into a long stretch of the same amino acid 
repeated many times in the nonfunctional protein. 

Huntington's disease, discussed earlier in this chap- 
ter, is caused by aTRD mutation in the coding region of 
the huntingtin gene. As a result, mutant huntingtin pro- 
teins contain regions where the amino acid glutamine 
is repeated hundreds of times. It is not yet clear why 
this extremely glutamine-rich region destroys the func- 
tion of the huntingtin protein and causes a degenerative 
and fatal nerve disease. However, there is a correlation 
between the severity of an individual’s Huntington’s 
disease symptoms and the number of TRD repeats 
contained in that individual’s huntingtin genes. The 
number of gene repeats might also explain the age of 
onset of symptoms and the rate of disease progression, 
which both vary considerably in Huntington’s disease. 
Studies show clearly that the patients who decline most 
rapidly have invariably inherited huntingtin alleles that 
encode the largest numbers of triplet repeats, and their 
cells produce mutant huntingtin proteins containing 
very long stretches of glutamine residues. 

Preliminary studies suggest that the mutant TRD 
proteins might bind directly to a protein enzyme called 
glyceraldehyde-3 phosphate dehydrogenase (GAPDH), 
which normally functions to metabolize glucose (sugar) 
and oxygen in all cells in the body. Researchers propose 
that abnormally tight binding between the mutant TRD 
proteins and the GAPDH enzymes in brain cells might 
block GAPDH enzyme activity, which would inhibit 
energy production in nerve cells and cause brain tissue 
to atrophy. Interestingly, preliminary lab experiments 
show that the mutant proteins containing the long- 
est stretch of a single amino acid bind most tightly to 
the GAPDH enzyme compared to the normal protein. 
This result potentially has wide reaching consequences 
because a similar TRD mechanism could potentially 
be the underlying cause of several different but related 
degenerative neurological disorders including myotonic 
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dystrophy, spinobulbar muscular atrophy, Friedreich’s 
ataxia, spinocerebellar ataxia, and some forms of mus- 
cular dystrophy (Table 10.3). 


TRD mutations alter the protein coding regions of genes, 
which cause changes in the structure and properties of the 
mutant protein, which can lead to degenerative neurologi- 
cal diseases such as Huntington’s disease. 


During the 1990s, autism became the fastest-growing 
developmental disability in the United States, increas- 
ing 172% at a time when the U.S. population increased 
by only 13%, and occurring in 1 in 150 births by 
2007. The emotional and financial cost to patients and 
families is enormous and the cost to society every year 
is about $90 billion and is expected to grow to $200 
billion to $400 billion by 2017. The reason for the 
startling increase in children diagnosed with autism is 
not clear, but it is not explained solely by the increase 
in public awareness of the disease. In addition, scien- 
tific evidence does not support a link between autism 
and childhood vaccines. 

Concern about the increase in autism has helped 
to raise funding and support for new research on 
autism including studies aimed at identifying the genes 
involved in autism and related disorders. A consor- 
tium of 11 scientific institutions began a comprehen- 
sive gene linkage study designed to identify all of the 
human genes involved in autism. Early results from this 
research suggest that two proteins in the brain called 
neuroligin-1 and neuroligin-2 are involved in autism in 
humans. In rats, the same proteins play a critical role in 
the development of nerve cell connections in the brain. 
The neuroligin proteins function at the junction where 
two nerve cells meet, the synapse, and enable nerves 
to make connections with other nerves. One neuroli- 
gin protein increases the excitability of the nerve cells, 
while the other neuroligin protein inhibits nerve cell 
activity. This important balance in nerve cell activity is 
disrupted in autism. 

A TRD mutation in the FMR1 gene is the leading 
cause of mental retardation and autism in humans. 
Unlike mutations that alter the coding region of a gene, 
the FMR1 mutation contains 200 tandem repeats of 
CGG base pairs located in the control region of the 
gene. This mutation prevents the FMR1 gene from being 
transcribed into RNA or expressed as protein, causing 
the degenerative mental disorder called fragile X syn- 
drome. The FMR1 protein normally regulates the trans- 
lation of certain mRNAs, whose protein products are 
needed for nerve cell development. The extremely long 
stretch of CGG DNA repeats in the mutant FMR1 gene 
distorts the physical conformation of the DNA double 
helix in the chromosome, blocking gene expression and 
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TABLE 10.3 Human diseases caused by triplet repeat mutations 
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Triplet repeat Description Symptoms 
diseases 
Fragile X Inherited mutant gene on X chromosome; most Mental retardation, attention deficit hyperactivity, anxiety, 
syndrome common form of genetic mental disease and autism; moodiness, large chin, ears 

males typically affected more than females 
Freidrich’s Inherited progressive disease damages nervous system; Difficulty walking: clubfoot, hammer toes, muscle loss in 
ataxia often scoliosis; heart disease feet, legs, and hands; loss of knee and tendon reflexes 
Muscular Family of several inherited muscle-destroying disorders Progressive weakness, loss of muscle strength, movement; 
dystrophy specific muscles involved and disease progression depend 

on type of muscular dystrophy 

Myotonic Most common form of adult muscular dystrophy; type 1 Weakness and loss of muscle control in lower legs, hands, 
dystrophy and type 2 neck, face; muscles fail to relax after use, cardiac defects 
Spinobulbar Inherited X-linked recessive disorder; neuromuscular Tremors, cramps, muscular atrophy, decreased motor and 
muscular disease affects spinal and bulbar neurons; primarily primary sensory neuropathy (nerve degeneration) 
dystrophy affects males 


Spinocerebellar 
ataxia 


Describes many inherited genetic disorders that 
are characterized by loss of coordination 


Progressive uncoordinated gait, hands, speech, and 
eye movements; atrophy of the cerebellum; symptom 
progression depends on disease type 


Huntington’s 
disease 


Degenerative, fatal nerve and brain disease; 
monogenic 


Chorea (uncontrollable jerky movements), balance and 
coordination problems, trouble shifting eyes without 
moving head, dementia, seizures; Parkinson-like symptoms, 
muscle rigidity, and tremors 


preventing the DNA from being copied into mRNA. 
Because the mutant FMR1 gene is not transcribed, the 
mutant FMR1 protein is not made in the fragile X cells. 
The cells carrying the FMR1 mutation also exhibit bro- 
ken X chromosomes because the long stretch of CGG 
base pairs in the FMR1 DNA weakens the chromosome 
structure at the site of the mutation (Figure 10.25). 


Hundreds of tandem CGG repeats in the FMR1 gene con- 
trol region distort the local chromosome structure, which 
not only prevents FMR1 gene expression, but also cre- 
ates a “fragile site” in the X chromosome. The X chromo- 
some DNA helix breaks easily at the position of the FMR1 
mutant gene. 


AMERICANS SEEK INFORMATION ON 
GENES, HEALTH, AND BIOMEDICAL 
RESEARCH 


Recently the American public has become more inter- 
ested in gaining a better understanding of the human 
body, how it works, and how genes and environment 
contribute to health, happiness, and longevity. Public 
interest increased in the 1990s with media coverage 
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FIGURE 10.25 The fragile X mutation causes the X chromosome to 
physically break at the fragile site in the DNA. 


of the Human Genome Project, including an ongoing 
debate over the total number of human genes needed 
to design and assemble a person. By the early 2000s, 
new technologies such as nuclear transfer (animal 
cloning; see Chapter 14) and embryonic stem cell 
research (see Chapter12) continue to fuel often conten- 
tious public discussions about the scientific and ethical 
issues raised by biomedical research. 
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It is important that the public appreciate the direct 
connection between human genes and human health 
so that people will understand the relationship between 
basic medical research, clinical testing, and the 
development of new treatments. In addition, support 
services such as health education and genetic coun- 
seling are increasingly important to help patients and 
their families deal with the emotional and financial 
impact that a genetic disease places on an entire fam- 
ily. Many people now use their computers to access 
information on medicine and health; the World Wide 
Web (Internet) has dramatically changed how we do 
home research on everything, including health and 
medicine. Many excellent Internet web sites provide 
reliable information about common and rare genetic 
diseases (see the Additional Reading section at the 
end of this chapter). Information about web sites dedi- 
cated to searching databases and manipulating DNA 
sequences are included in bioinformatics (Chapter 7), 
and online resources for visualizing biomolecules in 
three dimensions are described in Chapter 2. 


Americans now use the Internet as a primary source of 
information on health topics. But buyer beware, it is often 
difficult to decide between online fact and fiction and it is 
easy to make poor health choices based on faulty or out- 
dated medical information available online. 


The Dolan Learning Center at Cold Spring Harbor 
Laboratory (CSHL) (Cold Spring Harbor, New York) is 
an excellent resource on genetic diseases that offers 
clear explanations and animations covering every- 
thing from the basics on DNA to detailed information 
on new advances in biotechnology and medicine. 
Your Genes Your Health provides up-to-date, reliable 
information about 15 genetic diseases including some 
discussed in this chapter. The Cold Spring Harbor web 
site also offers clear basic information about genes 
in DNA from the Beginning. Cold Spring Harbor 
Laboratory is an acclaimed biomedical research center 
that also offers excellent science education programs 
for children. There is free online access to information 
about the CSHL scientists and their research, including 
the international SNP consortium, which is responsible 
for cataloging the human genome SNP markers used 
to construct the haplotype map or “HapMap” of the 
human genome. 

The HapMap will be an important genetic tool 
for researchers to use DNA genome variations to find 
genes involved in the complex diseases such as dia- 
betes, which are caused by multiple genes. The DNA 
sequences of any two people are almost identical, but 
the tiny variations in the DNA add up to what makes 
us different and unique and explain why some people 
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are at risk for heart disease while others are more 
likely to get cancer. When the SNPs are positioned 
near each other on the same chromosome DNA, they 
are inherited together and comprise a haplotype block. 
When complete, the HapMap will show the locations 
of all the haplotype blocks in the human genome. 
Researchers will scan the HapMap to find genes that 
cause complex human diseases, to identify genetic 
factors that cause susceptibility to infection, and 
to predict in advance the risk of adverse reactions to 
drugs and vaccines, all because scientists mapped the 
tiny differences among us to benefit everyone. 

The University of Utah is a top-notch center for 
human genetics that maintains an excellent web site with 
readable excellent educational information on the nuts 
and bolts of human genetics. At the University of Utah 
Genetics Science Learning Center, the public can find 
accurate information about many topics such as calculat- 
ing genetic risk, genetic counseling, personalized medi- 
cine, genetics and the brain, and much more. The web 
sites for foundations such as the Muscular Dystrophy 
Association, the National Fragile X Foundation, the 
Cystic Fibrosis Foundation, and the March of Dimes are 
all excellent sources of current, accurate information. 
These sites are often maintained by people with personal 
connections to the disease and can be the best sources 
of information about new treatments, organization activ- 
ities, and patient and family support groups. 

The Internet provides information on just about eve- 
rything, but it is important to realize that the informa- 
tion on the Internet is not necessarily accurate (even 
if it says so in Wikipedia). In terms of scientific accu- 
racy, the public can access an international database 
of all published scientific research articles from most 
scientific and medical fields through PubMed, a huge 
database of published research papers (see Chapter 7). 
Some articles require journal subscriptions for access 
to the entire paper, but many scientists pay to allow 
free access to their published reports, so the public 
can find reports on basic biomedical research from 
around the world. Be forewarned that the majority of 
the research papers include highly technical descrip- 
tions of experiments and results, but they often include 
references for review papers that offer approachable 
summaries of the research field. The references listed at 
the end of each paper can be directly accessed online. 


The U.S. government maintains excellent web sites to help the 
public navigate some complicated medical issues and to con- 
nect people with experts in every area of medicine and dis- 
ease. Resources feature human genes and diseases, updates 
on genetic counseling and clinical human trials, reports about 
new drugs in the pharmaceutical pipeline, and more. 
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SUMMARY 


In this chapter we discussed the important roles that 
mutant human genes play in the inheritance of genetic 
diseases, whether caused by a mutation in a single gene 
or by the actions of many defective genes. A muta- 
tion in a gene can alter the product of the gene, usu- 
ally a protein, which often prevents the mutant protein 
from functioning in the cell. Mutations in genes can be 
inherited or can be the result of a spontaneous DNA 
change caused by environmental insults to the genome. 
For scientists to understand diseases caused by mutant 
genes, they need to understand how the gene and the 
gene product function in normal cells. DNA differences 
between human genomes have been identified and are 
used in RFLP or SNP studies to identify the locations of 
genes on the chromosomes and to trace the inheritance 
of genetic diseases from generation to generation. 

Scientists can tell a lot about genetic diseases by 
examining condensed human chromosomes in a kary- 
otype display. Some disorders are caused by inheriting 
extra chromosome copies, whereas others occur because 
of large DNA rearrangements that alter the entire chro- 
mosome structure. Many genetic disorders result from 
tiny changes in the genome DNA that can only be 
detected by cytological methods using specific DNA 
probes. DNA probes are essential tools used in many 
methods including gene searches and RFLP/SNP map- 
ping. A DNA probe is typically a short, single-stranded 
DNA molecule containing a specific DNA sequence that 
is designed to bind to (base pair with) its complementary 
DNA target sequence in the chromosome. When the 
probe DNA base pairs with the target DNA, it generates 
a signal to indicate the physical location in the chromo- 
some of the DNA probe bound to the target DNA. 

Genetically linked markers (genes) are almost always 
inherited together and are used to make correlations 
between a mutant gene on a specific chromosome 
and the inheritance of a specific genetic disease. Over 
the years, thousands of dedicated people with genetic 
diseases and their families have participated in studies 
designed to trace genetic diseases through family gene- 
rations. New information and new genes have come 
from these studies, but progress toward novel treatments 
for these human genetic diseases lags behind. 


REVIEW 


To test your knowledge of the concepts in this chapter, 
answer the following questions: 


1. What is the meaning of the statement “Two genes 
are genetically linked”? 

2. Why do DNA probes bind specifically to target 
DNA sequences in a chromosome? 
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3. Summarize how RFLP and SNP markers are used 
to find mutant human genes. 

4. Explain why people who are genetic carriers of 
cystic fibrosis do not become sick with cystic 
fibrosis disease. 

5. Explain the difference between an inherited gene 
mutation and a spontaneous mutation in a gene. 

6. What does the term “X-linked gene” mean in 
terms of human chromosomes? 

7. Explain why testing newborn infants to uncover 
genetic diseases right after birth is important. 

8. Explain how the Internet has created free, unlim- 
ited access to science and medicine for the public. 

9. Explain why it is advantageous to the cell to 
inherit two copies of every chromosome, even 
though the chromosomes might carry both normal 
and mutant alleles of the same gene. 

10. Describe the evidence in laboratory mice that 
indicates how body weight is controlled at least in 
part by genes. 
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Could Gene Therapy Help Alcoholics to Stay on the 


Wagon? 


New Scientist, June 2007 

Many people of East Asian descent get very sick when 
drinking even a small amount of alcohol. These individuals 
carry a mutation in the gene encoding the enzyme alde- 
hyde dehydrogenase (ADA), which not only causes the 
bad reaction to alcohol but also reduces the risk of becom- 
ing an alcoholic by more than two-thirds. Disulfiram 
(or Antabuse) is a common drug used to help alcoholics 
quit their addiction to alcohol. This drug blocks the activ- 
ity of the same enzyme, aldehyde dehydrogenase, and it 
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seems to help alcoholics stop drinking. However, for this 
treatment to be effective, the drug must be taken every 
day, and this is a very difficult challenge for the addicts to 
successfully meet. 

A gene therapy approach might solve this problem 
because it provides a way to prevent expression of the alde- 
hyde dehydrogenase gene for the long term. Scientists tested 
this idea by constructing a virus vector carrying an “anti- 
sense” DNA copy of the ADH gene. Once inside the cell, 
the antisense DNA base-paired to the ADH messenger RNA 
(mRNA) and formed an RNA:DNA duplex. This action selec- 
tively blocked translation of the ADH mRNA and prevented 
production of the ADH protein (enzyme). This therapy has 
great potential because it blocks enzyme activity just as if 
the patient had taken the disulfiram drug every day. 

Scientists tested this ADH therapy on “addicted” lab 
rats bred to crave alcohol. One injection of the vector car- 
rying an “antisense” DNA copy of the ADH gene into the 
“addicted” rats decreased the ADH enzyme activity in the 
liver by 80%. The rats that had previously craved alcohol 
drank 50% less for more than a month after the gene ther- 
apy treatment. (American Society of Gene Therapy, 2007) 


LOOKING AHEAD 


Gene therapy is an innovative use of recombinant 
DNA technology that offers tremendous potential for 
the widespread treatment of many human genetic dis- 
eases and disorders. For many years, gene therapy 
has been the center of public controversy, and the 
early gene therapy trials with human patients suffered 
very serious setbacks, raising the distinct possibility 
that the sunny promises offered by the proponents of 
gene therapy will never be realized. Progress in gene 
therapy will always have technological challenges and 
will have to answer ethical, legal, and social questions. 
However, on the positive side, the gene therapy field 
has made some amazing advances not only in the 
research lab with animals but also in some clinical cases 
involving human patients. This chapter examines the 
history and current status of gene therapy and explores 
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(A) (B) 


FIGURE 11.1 


Ashanti and Cynthia were the first human gene therapy patients. (A) Four-year-old Ashanti DeSilva is holding the hand of her 


gene therapy doctor, Dr. W. French Anderson in 1999. (B) Ashanti (right) and Cynthia (left) visit the Cleveland Zoo together, doing well three 


years after receiving gene therapy treatment. 


the applications of the principles of gene therapy to the 
successful treatment of human diseases. 

On completing the chapter, you should be able to 
do the following: 


e Understand how “normal” (wildtype) genes can be 
introduced into the cells of a patient through the 
use of vectors or other delivery methods. 

e Explain what goes wrong in individuals who are 
deficient in a specific protein (enzyme), and outline 
a possible gene therapy treatment for that specific 
disease or disorder. 

e Understand the characteristics of the different vec- 
tors used in gene transfer, and appreciate the mech- 
anisms used by the different vectors during the gene 
therapy process. 

e Describe how RNA interference (RNAi) technol- 
ogy is used as a gene therapy application to treat 
disease. 

e Describe the reasons that different gene delivery 
methods were chosen to treat a brain disorder like 
Parkinson’s compared to treating an eye disease 
that destroys the retina and causes blindness. 

e Explain the medical risks and potential complica- 
tions of gene therapy. 


INTRODUCTION 


In September 1990, DNA history was made when 
two little girls named Ashanti and Cynthia became 


the first two people to receive gene therapy treatments 
for a genetic disease (Figure 11.1). The girls each 
inherited a mutant gene that causes a serious immune 
deficiency disease, leaving them unable to fight infec- 
tions. This historic gene therapy treatment provided 
Ashanti and Cynthia with billions of treated cells 
carrying the normal (wildtype) gene and directing 
the synthesis of wildtype proteins. In this case the 
function of the wild-type proteins compensated for the 
nonfunctional mutant proteins produced in the disease 
cells. 

The fundamental idea behind gene therapy seems 
almost too easy: just replace an altered (mutant) gene 
with the corresponding normal (wildtype) gene. In the- 
ory, when a disease is caused by a single gene muta- 
tion it should be straightforward to treat the disease 
with a wildtype gene. The defective, mutant gene in the 
patient’s cells can potentially be rescued by introducing 
the wildtype gene, which makes the functional wildtype 
proteins that function properly in the cells. The ability 
to plan a successful gene therapy strategy depends on 
knowing the identity of the mutant gene that causes 
the disease. The scientists must learn about the specific 
gene expression patterns and biochemical processes in 
the healthy and disease cells in order to design the best 
approach for the gene therapy treatment, including the 
best way to deliver the therapeutic wildtype gene to the 
target cells in the body. 

Despite the initial successful use of gene therapy 
to treat Ashanti and Cynthia (discussed later), much 
controversy surrounded gene therapy issues in the 
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1990s and continues today, with plenty of critics and 
proponents on both sides. Gene therapy still faces 
many challenges before it can become a widespread 
treatment for genetic diseases, but specific gene ther- 
apy approaches have been used as successful treat- 
ments for certain diseases and disorders. The potential 
of gene therapy should not be underestimated, espe- 
cially in light of the exciting research advances in the 
field. This chapter covers the facts and controversies 
surrounding gene therapy and reviews the government 
regulations for the oversight of human gene therapy 
trials. 

Advances in recombinant DNA technology solved 
many of the technical obstacles that have inhibited the 
clinical implementation of gene therapy treatments. The 
complete DNA sequence of the entire human genome 
helped to launch the fields of genomics and bioinfor- 
matics (Chapters 6 and 7) and flooded the scientific 
(and public) databases with extensive information 
about novel human genes. These studies have revealed 
that many new human genes are implicated in genetic 
diseases and are potential targets for gene therapy treat- 
ments. Different types of gene therapies are used to 
treat different genetic diseases and some gene therapy 
treatments have advanced to clinical trials with human 
patients (see Chapter 10). 

In some ways, gene therapy is a relatively new field, 
but even so it has already had a large impact on many 
areas of science, medicine, and society. This chap- 
ter describes how clinical gene therapy trials are con- 
ducted to test some of the many candidates for gene 
therapy treatments. It is beyond the scope of this chap- 
ter to comprehensively report on all of the gene therapy 
activities currently ongoing around the world. Instead, 
the goal of this chapter is to learn about some of the 
most interesting and exciting gene therapy trials, includ- 
ing the different vectors used, and the cell targeting and 
delivery system used for successful gene therapy. 


GENE THERAPY: A METHOD TO RESCUE 
MUTANT GENES 


Gene therapy is a medical process where the wildtype 
(normal) version of a gene is introduced into a patient's 
cells to treat the disease caused by the mutant form of 
the gene, which failed to function properly. A disease 
that is caused by a mutation in a “single gene” is a good 
candidate for successful gene therapy treatment because 
it is sometimes possible to replace the single mutant 
gene in the patient's cells with a normal (wildtype) 
copy of the same gene. When the function of normal 
proteins can compensate for the defective function of 
the mutant proteins, then the gene therapy strategy 
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FIGURE 11.2 Key steps in a gene therapy treatment. (1) Therapeutic 
(wildtype) gene (blue DNA helix). (2) Therapeutic gene (blue DNA) 
is inserted into a vector (yellow arrow). (3) The vector carrying the 
therapeutic gene is delivered into the host cell. (4) The vector DNA 
travels into the host cell nucleus. (5) The therapeutic gene is tran- 
scribed into mRNAs (not shown), and the mRNAs are translated to 
produce the therapeutic proteins. 
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FIGURE 11.3 /n vivo gene therapy. Gene therapy performed on 


cells inside a patient's body using therapeutic wildtype gene DNA. 


of transplanting a wildtype gene into disease cells is 
likely to succeed (Figure 11.2). 

There are two main approaches used in gene therapy 
depending on whether the diseased cells are treated 
inside the body (in vivo) (Figure 11.3) or are removed 
from the body for treatment (ex vivo) and then returned 
to the patient (Figure 11.4). In vivo gene therapy methods 
must use gene delivery systems that accurately target the 
therapeutic genes into specific cells in the patient's body 
(Figure 11.3). If the diseased cells can be removed from 
the patient, they will be treated with therapeutic DNA in 
the lab and then the genetically altered cells are returned 
into the body of the same patient (Figure 11.4). 

One of the biggest challenges in gene therapy is the 
need for gene delivery vehicles called vectors that can 
accurately target the therapeutic gene into the correct 
cells in the patient’s body. Gene therapy vectors are 
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Gene therapy treatments are designed to rescue the defect 
in the cells carrying the mutant gene by introducing the 
wildtype gene into the cell to make functional proteins that 
can rescue the defective function of the mutant protein. 
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FIGURE 11.4 Ex vivo gene therapy. Ex vivo gene therapy is per- 
formed on disease cells removed from the patient. Different gene 
therapy vectors and methods can be used to treat the cells depend- 
ing on the genetic disease. After the cells are treated with the appro- 
priate gene treatment, the cells are returned to the patient. 
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usually made of functional DNA molecules that deliver 
the therapeutic gene cargo into the disease cells in the 
body. Vectors have been derived from many sources 
including RNA and DNA viral genomes, circular dou- 
ble-stranded DNA bacterial plasmids, and artificial 
eukaryotic chromosomes. Recently a new delivery 
system was developed involving carbon nanotubes 
that carry a DNA or drug cargo to the target cells (see 
Chapter 13). 


Virus DNA Vectors 


Successful gene therapy requires a vector or other 
delivery system to carry the therapeutic gene into the 
diseased cells. Many early gene therapy experiments 
depended on vectors made from the linear adenovi- 
rus double-stranded DNA genome. Adenoviruses have 
linear DNA genomes covered in a capsid made of 
virus proteins (Figure 11.5). Adenoviruses cause upper 
respiratory tract infections in people with low immu- 
nity. However, in the absence of the viral capsid, the 
adenovirus DNA genome alone is not infectious to 
humans. To introduce an adenovirus vector into cells, 
the vector DNA is packaged into a capsid made of 
viral proteins, so it can be delivered into the nucleus 
with high efficiency (Figure 11.6). Different vectors 
have been derived from adenovirus genomes and 


FIGURE 11.5 Virus genomes and viral vectors can be packaged into capsids made of viral proteins. (A) The electron micrograph (EM) shows 
the protein structures of adenovirus capsids and the much smaller adeno-associated virus (arrow). (B) The protein projections on the surfaces 
of the virus capsids are barely visible in the EM but are shown on the diagram of the adenovirus capsid structure. 
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FIGURE 11.6 Therapeutic genes in viral DNA vectors are delivered 
into cells. (1) The wildtype (normal) gene is carried by the viral vec- 
tor, which is packaged with virus capsid proteins into recombinant 
gene delivery system. (2) The recombinant virus carrying the wildtype 
gene is used to infect the diseased cells. (3) The viral vector and 
wildtype gene are inserted (integrated) into the chromosome DNA in 
the host cell (patient’s) genome. (4) The therapeutic wildtype gene in 
the cell is expressed as mRNAs and wildtype (normal) proteins. 


technological improvements over time have made the 
modern adenovirus vectors much more efficient and 
safer to use. The earliest adenovirus vectors carried the 
risk of serious side effects in animals and people. 

Once the DNA vectors and the genes have entered 
the target cells in the body and penetrated the nucleus, 
the therapeutic gene DNA is copied into mRNAs by the 
host cell enzymes. Then the mRNAs are processed and 
transported into the cytoplasm where they are trans- 
lated into proteins that provide therapeutic benefits 
(see Figure 11.6). Scientists must consider many factors 
when deciding which disease treatments might benefit 
from using a gene therapy approach. Of course, the 
disease must result from a DNA mutation in a known 
gene, and a wildtype, unmutated copy of the gene 
must be identified and available for cloning (insertion) 
into an appropriate vector. Each gene therapy treat- 
ment includes a strategy to deliver the wildtype gene(s) 
into the diseased cells. 

Inside the nucleus, the fate of the vector DNA (and 
the therapeutic gene) depends on the specific proper- 
ties of the vector used. In some cases, successful treat- 
ment requires that the therapeutic genes be actively 
expressed for prolonged periods of time in the host 
cell nucleus. To accomplish this, scientists often use a 
type of vector that integrates (inserts) directly into the 
DNA in the chromosome and becomes a permanent 
part of the host genome. The vector DNA integrates 
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into the genome through homologous recombination 
between DNA sequences in the vector and DNA in the 
host chromosome (see Chapter 6). Once integrated, 
the vector and the therapeutic gene replicate when 
the genome duplicates and are distributed to offspring 
cells along with the native chromosomes during cell 
division. 

Many different virus genomes in addition to adeno- 
viruses have been modified for use as animal vectors 
in human gene therapy trials. Once inside the cells the 
vectors derived from circular double-stranded DNA 
plasmids do not persist for long periods in the nuclei. 
Even though the circular DNA vectors replicate (dupli- 
cate) inside the cell, they are not equally distributed 
(segregated) when the cells divide and make more 
cells. As a result, DNA vectors that do not insert into 
the genome are chosen when it is appropriate to pro- 
vide only transient expression of the therapeutic gene 
in the host cells. 


Choosing the best gene therapy vector to use for each 
treatment depends on understanding the basic biological 
mechanisms responsible for the disease process, including 
the tissues and cells involved and the timeline of disease 
progression. 


Retrovirus RNA Vectors 


Some vectors designed for use in human gene therapy 
have been derived from retrovirus RNA genomes. 
These viruses get their name from the clever strategy 
that they use to infect and take over the biosynthetic 
machinery in the cell (“retro-” = reversal). The retro- 
virus infection depends on reversing the usual flow 
of genetic information in the cell, from RNA to DNA 
to protein instead of the traditional flow of informa- 
tion: DNA genes are copied into RNA transcripts 
and translated into proteins (see Chapter 3). This feat 
is accomplished by a special enzyme called reverse 
transcriptase, which has the unusual ability to copy 
the retrovirus RNA into double-stranded DNA 
(Figure 11.7). In the cell nucleus the reverse tran- 
scriptase enzyme makes a double-stranded DNA 
copy of the retrovirus RNA genome, which is then 
inserted (integrated) into the host cell chromosome 
to make a provirus genome. The genes encoded by 
the provirus genome are expressed as mRNAs and 
are translated into the proteins that package the new 
retrovirus RNA genome. The virions also contain 
reverse transcriptase enzyme proteins ready for use in 
the next infected cell. 
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FIGURE 11.7 Retrovirus life cycle. (1) The retrovirus gains access to a cell when a glycoprotein on the surface of the virus capsid binds to a 
receptor protein on the surface of the cell. The envelope surrounding the virus particle fuses with the cellular membrane, and empties the ret- 
rovirus RNA genome into the cytoplasm of the cell. The retrovirus virion carries reverse transcriptase enzyme proteins in addition to the RNA 
virus genome. Reverse transcriptase enzymes have the unusual activity of copying RNA into DNA, instead of DNA into RNA, the normal gene 
express ion pathway in the cell nucleus. (2) In the nucleus the viral reverse transcriptase enzyme copies the retrovirus RNA genome into double- 
stranded DNA, which is then inserted (integrated) into the host cell genome. (3) The virus genes encoded by the integrated retrovirus genome 
(the provirus) express viral mRNAs that are translated into viral proteins. The capsid proteins package the new retrovirus RNA genomes into viri- 
ons, which acquire a membrane envelope with viral proteins as they exit through the cell’s plasma membrane. 


To make the retrovirus vectors as safe as possible, 
researchers used standard recombinant DNA methods 
(see Chapters 4 and 5) to remove the dangerous genes 
from the retrovirus genome and create vectors that lack 
the three retroviral genes (pol, gag, env) (Figure 11.8). 
These virus proteins package the genome (or vector) 
into viral capsids (see Figure 11.7). Without these capsid 
proteins the vector cannot be packaged into a virion 
and cannot spread to other cells. 

The development of safe and effective gene deliv- 
ery options has been a major obstacle to the routine 
use of gene therapy in medical treatments. Most of 
the early gene delivery systems rely on DNA vec- 
tors that were derived from viral genomes. Viral 
delivery systems are often more efficient than nonvi- 
ral methods, because the viral vectors can be pack- 
aged into protein capsids that enter the cells using 
the efficient pathway normally used by the invad- 
ing virus. Viral vectors are modified viral genomes, 
which can sometimes cause serious adverse reactions 
in patients. Or the viral vector might insert acciden- 
tally into an essential gene in the chromosome, caus- 
ing a mutation that might be lethal to the patient. 
There is also the potential risk that viral vectors might 
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FIGURE 11.8 Building a retroviral vector. (1) The wildtype ret- 
rovirus genome contains viral genes (gag, pol, env) and has long- 
terminal repeat (LTR) regulatory sequences located at the ends of 
the linear RNA genome. (2) To make the retroviral genome safe for 
use as a gene therapy vector, the genes that the virus uses to make 
more viruses during an infection were removed (deleted) from the 
retroviral genome (gag, pol, env). Without these three genes and 
their proteins, the retrovirus genome is not infectious. (3) The thera- 
peutic gene needed for the gene therapy was inserted between the 
LTR sequences at each end of the retrovirus vector. Retrovirus gene 
therapy vectors retain the ability to integrate into the host cell chro- 
mosome, where the vectors can remain integrated in the genome for 
years without becoming infectious. 
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FIGURE 11.9 Cells can be transformed using DNA fired by a gene gun. (A) DNA encoding a green fluorescent protein (GFP) gene was 
introduced into a nerve cell in the brain causing the body of the nerve cell and its extended axons to fluoresce with a green color. (B) Rice 
embryos were genetically modified by DNA from a gene gun. The transferred DNA carried with it a “marker” gene that was expressed in the 


cells, and caused the rice embryos to have a blue color (see Chapter 15). 


spread to parts of the body outside the gene therapy 
treatment zone. Many scientists and biotechnology 
companies have developed nonviral gene delivery 
systems to use as possible alternatives to viral vectors 
for gene therapy, including electroporation, which 
uses an electric field to drive DNA into the cells, and 
gene guns, designed to shoot DNA projectiles directly 
into the nuclei of the cells to be treated. Gene guns 
have also been used to successfully propel genes 
into the subcellular organelles, including mitochon- 
dria and chloroplasts (plant cells) (Figure 11.9) (see 
Chapter 15). Liposome fusion is another approach to 
nonviral gene transfer. Liposomes are hollow spheres 
surrounded with membranes made up of fatlike 
molecules called phospholipids (Figure 11.10). The 
vector DNA and therapeutic gene is inserted into the 
liposome spheres (vesicles), which are then mixed 
together with the diseased cells to be treated. The lipo- 
somes fuse with the plasma membrane of the cell, 
releasing the DNA into the cytoplasm. By an unknown 
mechanism, the vector DNA is transported into the 
nucleus where the therapeutic gene is expressed 
(Figure 11.11). 
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FIGURE 11.10 Liposomes can carry DNA genes. Like eukaryotic 
cells, liposomes are surrounded by phospholipid bilayer membranes. 
Liposomes carrying therapeutic DNA genes or drugs can fuse with 
the plasma membrane of a eukaryotic cell to deliver the contents 
inside the cell. 


Viral vectors are the most common vehicles used to deliver 
therapeutic genes, but new approaches are under develop- 
ment using nonviral methods designed to improve delivery 
and accuracy, and that also offer ways to control expres- 
sion of therapeutic genes in the disease cells. 
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FIGURE 11.11 


Liposomes deliver genes into the cell nucleus. (1) 
Liposome with DNA inside (DNA-lipid complex). (2) Cell takes up 
the liposome-DNA complex by endocytosis. (3) DNA-lipid complex 
enters cell nucleus. (4) Therapeutic gene is expressed in the nucleus. 


Human Artificial Chromosome (HAC) Vectors 


Artificial chromosome vectors are actually derived from 
native chromosome DNA sequences and they replicate 
in the nucleus alongside the native chromosomes. These 
linear double-stranded dsDNA vectors mimic the cell’s 
chromosomes and assemble with histone proteins to 
make linear mini-chromosomes that not only replicate 
(duplicate) just like the native chromosomes, but they 
also segregate properly during cell division, insuring 
proper gene inheritance by offspring (progeny cells). 
Perhaps the most sophisticated gene therapy vec- 
tors ever developed are human artificial chromosomes 
(HAC). These linear vectors contain all of the DNA ele- 
ments necessary to function as a native human chromo- 
some, only are much shorter. In 1997, researchers at 
Case Western Reserve University assembled an entire 
human chromosome from DNA fragments made in the 
lab. Each chromosome vector included a centromere to 
ensure that the vector DNA can attach to spindle fibers 
and be inherited properly, a DNA replication origin to 
start DNA replication during cell division, and telomere 
DNA at both chromosome ends. HAC vectors have no 
upper limit on the length of DNA that can be inserted 
between the two telomere ends, because the artificial 
chromosomes are not restricted by the need to be pack- 
aged into virus capsids. These vectors also carry the DNA 
regulatory elements necessary to control expression 
of the therapeutic gene once inside the disease cells. 
Imaging techniques offer one way to monitor the artifi- 
cial chromosome vectors inside the cells (Figure 11.12). 


POSSIBLE RISKS: THE HUMAN SIDE 
OF GENE THERAPY 


All medical treatments have inherent risks, and the 
development of new drugs and new treatments such 
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FIGURE 11.12 Human artificial chromosome. An artificial human 
chromosome (*) is shown among many native human chromosomes. 
The centromere regions of the chromosomes, including the artificial 
chromosome, show up as two bright dots and the artificial chromo- 
some is a light green color. 


as gene therapy carries special risks for the patient. 
Processes such as informed consent were put in place 
to protect the many people who volunteer to partici- 
pate in clinical trials of all kinds. Jesse Gelsinger had 
just turned 18 years old when he volunteered for a 
gene therapy trial in 1999 (Figure 11.13). A look at 
the circumstances of Gelsinger’s case shows how the 
simple idea of using a good gene to replace a mutant 
gene is actually a difficult goal, not only from the 
perspective of the science involved, but also from 
the standpoint of human motives. Gelsinger died 
as a result of gene therapy, but Gelsinger’s case also 
changed the way that federally supported human gene 
therapy trials are reviewed and regulated for the safety 
of the patient. 

Gelsinger was fortunate to have the support of 
his family when he was diagnosed with a nonfatal 
form of a rare genetic disorder called ornithine tran- 
scarbamylase (OTC) deficiency in 1983. Lack of the 
OTC enzyme interferes with the ability of the liver to 
metabolize ammonia, causing toxic buildup of ammo- 
nia in the body. Babies born with the severe form of 
OTC deficiency rarely survive. As Jesse Gelsinger grew 
up, he struggled with the restrictions of his disease, 
but for the most part, Gelsinger did well. He had times 
when OTC made him sick, but Gelsinger persisted. He 
reached his teen years by managing his disease with 
a strict low-protein diet (to cut down on the amount 
of ammonia in his body) and by taking more than 30 
medications every day. 

In 1998, Gelsinger’s specialist told his family about 
a clinical OTC gene therapy trial under way at the 
University of Pennsylvania in Philadelphia. Gelsinger 
was very interested, but he could not participate until 
age 18. That year he again became very ill with OTC 
and was hospitalized in a coma. Gelsinger finally 
recovered, and in May of 1999 he graduated from high 
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FIGURE 11.13 Meet Jesse Gelsinger. Jesse Gelsinger had just turned 
18 years old when he volunteered to participate in a gene therapy 
clinical trial for a disease called OTC. Gelsinger died a few days after 
the start of the gene therapy trial, but he also helped to change the 
way human gene therapy trials are regulated and supervised in the 
United States. 


school. That summer was Gelsinger’s 18th birthday, 
and with the support of his family, he decided to vol- 
unteer for the OTC gene therapy trial. 

The Gelsinger family flew to Philadelphia to be 
with Jesse when he was screened in the hospital for 
acceptance into the OTC gene therapy trial (called the 
Batshaw-Wilson study). Gelsinger was very excited 
when he learned that he had qualified for the gene 
therapy trial to be headed by top physician Dr. James 
Wilson and performed at one of the premier institutions 
doing gene therapy at the time, the Institute for Human 
Gene Therapy at the University of Pennsylvania. One 
of Wilson’s colleagues explained how the gene therapy 
technique would work. While he was under sedation, 
two catheters would be placed into Gelsinger’s liver; 
one in the hepatic artery to inject the viral vector into 
the liver and another to monitor the blood leaving the 
liver to be sure that the vector was being absorbed by 
the cells in the organ. In addition to the risk involved, 
the gene therapy treatment would not provide a long- 
term benefit to Gelsinger and the other volunteers. 
Even if the therapeutic genes worked properly in this 
trial, the effect would be transient because the body’s 
immune system would attack and kill the virus vector 
in four to six weeks after treatment. 
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On September 13, 1999, Gelsinger’s hepatic artery 
was injected with the genetically altered adenovirus. 
Twenty hours into the therapy, Gelsinger developed 
jaundice, a yellow tinge in the eyes and skin that is often 
a sign of liver failure. Some of the animals had exhib- 
ited the same complication in earlier testing of this gene 
therapy vector protocol. Soon Gelsinger entered a coma, 
then he suffered multiple organ system failure and was 
placed on life support until he died four days later. 

Seven years before Gelsinger volunteered for the OTC 
clinical trial, the head of the trial, Dr. James M. Wilson, 
had started a for-profit, private company named Genovo, 
Inc., with Wilson as the major shareholder. During the 
1990s, Genovo, Inc. gave almost $5 million each year to 
the Institute for Human Gene Therapy at the University of 
Pennsylvania, at a time when Wilson was its director. In 
1993, Mark Batshaw and Wilson conducted lab experi- 
ments on OTC-deficient mice using an adenovirus vec- 
tor to carry the wildtype OTC gene. These animal studies 
predicted future success with the gene therapy treat- 
ments in humans, but the data also revealed significant 
safety problems with the OTC gene therapy protocol. In 
a separate experiment, three monkeys died from a treat- 
ment using an adenovirus vector similar to the vector 
used in the human trials, and additional animals used 
in studies suffered severe hepatitis and liver failure after 
exposure to adenovirus vectors. Genovo, Inc. produced 
the adenovirus vector used in the human OTC trial. 

At least five years before Gelsinger’s death, the gene 
therapy community appeared to be aware of possible 
dangers posed by the adenovirus vector. In 1995, when 
the Recombinant DNA Advisory Committee of the 
National Institutes of Health approved the Batshaw- 
Wilson OTC gene therapy protocol for use on human 
subjects, but two dissenting experts warned that the 
trial was too risky to include volunteers with the non- 
fatal form of OTC. Eventually 19 OTC-deficient adults, 
including Gelsinger, enrolled in this gene therapy trial, 
and Gelsinger received the highest possible dosage of 
the adenovirus vector. Since Gelsinger’s death in 1999, 
there have been several additional gene therapy fatali- 
ties in the United States. Questions about the safety of 
gene therapy rose again in the summer of 2007 after 
the death of 36-year-old Jolee Mohr, who took part in 
an experimental gene therapy trial to treat rheumatoid 
arthritis. 

Independent investigators concluded that Jesse 
Gelsinger’s death was partly due to the failure of the 
scientific team to get appropriate informed consent 
from the patient and his family. Informed consent is 
designed to ensure that patients have all of the informa- 
tion that they need to make the best possible decision 
about whether or not to participate in a specific clini- 
cal trial. In January 2008, a top scientific journal in the 
field, Human Gene Therapy, published articles empha- 
sizing the importance of informed consent in gene 
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therapy trials, stressing that it is imperative for patients 
to understand the risks and benefits of a clinical trial. 
Ironically, the authors of these articles included some 
of the key doctors involved in Gelsinger’s gene therapy 
trial; the editor-in-chief of Human Gene Therapy was 
Dr. James Wilson and author bioethicist Arthur Caplan 
was also a member of Gelsinger’s medical team at the 
University of Pennsylvania. Both Wilson and Caplan 
were defendants in the lawsuit brought by the Gelsinger 
family, which was settled out of court in 2000. 


SUCCESSFUL GENE THERAPY 
TREATMENTS FOR HUMAN 
GENETIC DISEASES 


Some of the gene therapy methods used to successfully 
treat human diseases are described here to help illus- 
trate the different scientific approaches, technologies, 
and people involved in the field of gene therapy. Many 
human diseases not mentioned here are good candi- 
dates for gene therapy treatments, and many research 
and clinical studies are currently under way. 

A different type of gene therapy approach was used 
to treat melanoma, a virulent form of skin cancer. Special 
cells called tumor-infiltrating lymphocyte cells (lym- 
phocytes that attack tumors) (TILs) were isolated from 
the patients and treated in the lab with a gene encoding, 
the tumor necrosis factor (TNF) protein. The cells begin 
to make this anticancer protein and are then returned 
into the patient. The production of TNF will inhibit the 
growth of the cancer cells. More about cancer cells and 
the new approaches to treating cancer in Chapter 9. 
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Cardiac Gene Therapy Improves 
Heart Function 


The gene encoding the vascular endothelial growth 
factor (VEGF) protein has been used in successful car- 
diac gene therapies. VEGF is a naturally occurring pro- 
tein that can trigger the growth of new blood vessels 
in the body (see Chapter 9). The VEGF DNA can be 
injected directly into the heart safely and was highly 
effective in treating advanced coronary heart disease. 
The VEGF protein is expressed in the sick heart muscle 
cells, triggering the growth of new blood vessels in the 
regions of the heart containing the oxygen-starved car- 
diac cells. Instead of using a virus genome as a vector 
to carry the gene DNA into cells, this form of cardiac 
gene therapy uses a DNA plasmid to carry the VEGF 
gene into the heart tissues where it temporarily stimu- 
lates the development of new blood vessels. 


SCID and Adenosine Deaminase (ADA) 
Deficiency: Bubble Boy Disease 


In the United States, about 1 in 100,000 babies are born 
with some form of severe combined immunodeficiency 
disease (SCID), which is caused by a mutant gene that 
makes a faulty adenosine deaminase (ADA) enzyme. 
Normally the ADA enzyme degrades DNA and RNA 
molecules into components that are recycled by the cell 
for other uses. The ADA enzyme is produced in spe- 
cialized T lymphocytes (T cells) in the immune system. T 
cells are essential to the body’s immune system because 
they control the activity of B lymphocytes, the cells 
that make the antibodies that fight infectious agents. 
Individuals without the ADA enzyme cannot mount an 


Box 11.1 The Impact of Jesse Gelsinger’s Life 


Jesse Gelsinger’s death was the first report of a human fatality 
known to be directly linked to a gene therapy trial. His death 
served as an alert to warn scientists everywhere that serious 
and possibly fatal side effects can accompany the use of some 
vectors in gene therapy treatments. 

Gelsinger’s case helped bring attention to the potential dan- 
gers when the scientists, doctors, and staff fail to follow safety 
standards when doing human studies. Formal investigation into 
the circumstances of Jesse Gelsinger’s death revealed serious 
problems with the OTC gene therapy trial that included reports 
of sloppy practices used to select volunteers, failure to confirm 
that the volunteers met the necessary medical criteria for enroll- 
ment, and soliciting volunteers using inadequate informed con- 
sent. Investigation officials found that the doctors had removed 
language from the consent forms describing the animal deaths 
and sickness earlier in the research, including previous studies 


on the adenovirus vector that noted cases of liver failure in ani- 
mal subjects. They also failed to report that two volunteers in 
a previous OTC study had suffered severe reactions at vector 
dosages lower than the dose Gelsinger received. The doctors 
apparently continued the OTC trial using higher vector dosages 
without consulting the Food and Drug Administration (FDA). 

Sanctions were brought against the doctors and research- 
ers involved in Gelsinger’s case, prohibiting them from con- 
ducting human gene therapy trials in the future. In 2005, 
when Wilson settled with the U.S. Department of Justice, he 
agreed not to conduct human clinical trials for five years. As a 
direct result of Gelsinger’s case, the U.S. Department of Health 
and Human Services created the Gene Therapy Clinical Trial 
Monitoring Plan, which dramatically increased the level of 
oversight and scrutiny for the entire gene therapy testing proc- 
ess in humans. 
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effective immune defense against germs and will always 
be at risk for life-threatening infections. 

The initial ADA gene therapy performed on Ashanti 
and Cynthia in 1990 were ex vivo treatments, which 
means that the lymphocyte cells were removed from 
each patient and treated in the lab with billions of ret- 
rovirus vectors carrying wildtype ADA genes. Then the 
lymphocyte cells treated with the normal ADA genes 
were returned to the patient’s body (Figure 11.14). The 
entire gene therapy procedure was repeated several 
weeks later because T lymphocyte cells normally sur- 
vive for only a short time in the human body (whether 
or not they have undergone gene therapy). The ADA 
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gene therapy administered to the two girls in 1990 
was successful, although there is still some controversy 
because the girls received an ADA enzyme replace- 
ment treatment in addition to the gene therapy. 

This type of immune deficiency disorder first came 
to the public’s attention in the 1970s through the story 
of David Vetter, a boy born with SCID. Vetter lived in 
a sterile plastic bubble for most of his 12 years, until 
he died in 1984 after an unsuccessful bone marrow 
transplant (Figure 11.15). The lives of David Vetter 
and Ted DeVita sparked public interest and prompted 
a TV movie made in 1976 called The Boy in the 
Plastic Bubble starring John Travolta. Since that time, 


Genetically disabled 
retrovirus 


Q 


Cloned ADA gene is 
incorporated into virus 


(Klug & Cummings 1997) 


cells are reimplanted, 


Genetically altered j 
produce ADA 


Cells are grown ®© 
in culture to ensure 
ADA gene is active 


FIGURE 11.14 Flowchart of ex vivo gene therapy used to treat babies with SCID. The normal ADA gene was inserted into a retroviral vector 
and packaged into a viral capsid. T cells with the mutant ADA gene were taken from the babies with SCID and treated with the ADA retrovi- 
rus vector, which transferred the normal ADA gene into the T cells from the babies. The T cells expressed the wildtype ADA proteins and were 
transplanted back into the babies where they continued to produce functional ADA enzymes. 
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FIGURE 11.15 David was the first “Bubble Boy.” David Vetter (A) and Ted DeVita (B) both had SCID and both lived in different versions of 
“sterile bubbles” all their lives, waiting for the development of an effective treatment for SCID. (C) The amazing lives led by David and Ted in 
their respective sterile environments inspired the movie The Boy in the Plastic Bubble starring John Travolta (1976). 
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the progress made in organ transplant technology has 
greatly improved the effectiveness of bone marrow 
transplants as treatments for SCID and for other medi- 
cal conditions caused by the failure of a patient's bone 
marrow to produce various types of blood cells (Figure 
11.16). To be successful, the donated bone marrow 
must come from a living genetic relative of the patient 
(such as a sibling) or from an unrelated donor who is 
“matched” to the patient through special blood tests 
using HLA (Human Leukocyte Antigen) genetic typing. 
The risk of rejection (host versus graft disease) occurs 
whenever a patient receives a transplant of cells or 
tissues from a genetically unrelated donor. Rejection 
is a serious complication that can sometimes be sup- 
pressed by medications. These powerful drugs come 
with dangers as well, because they significantly reduce 
the body’s ability to fight infections. 

Almost 15 years after David Vetter died, a baby 
named Owen was living in the United Kingdom in a 
sterile bubble, waiting for a bone marrow transplant to 
cure his SCID. Owen finally received a bone marrow 
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FIGURE 11.16 Bone marrow transplant therapy. (A) Blood contains 
many components that perform important jobs such as transporting 
oxygen and nutrients to body tissues and removing carbon dioxide 
and producing cells essential to the immune system. Blood factors 
are required for clot formation and help to prevent and fight infec- 
tion. (B) Bone marrow is a tissue found inside the long bones in the 
skeleton, which produces blood cells (red blood cells, white blood 
cells, and platelets). Red blood cells carry oxygen in the blood, 
white blood cells fight infection, and platelets function in blood clot- 
ting. (C) A bone marrow transplant is performed to treat patients who 
have serious forms of their disease. 


DNA and Biotechnology 


transplant and even though the bone marrow came from 
an unrelated donor, the donor was still a close genetic 
match, and the transplant was successful. Owen sur- 
vived and remarkably, at age 3 years, he was at center 
stage once again when he became the first child to 
donate his own bone marrow to cure his infant brother, 
Niall, also born with SCID. 

In 1995, infants diagnosed with ADA were treated 
for the first time with stem cells collected after birth 
from their umbilical cords. In the lab the umbilical 
cord cells were treated with vectors carrying the 
wildtype ADA genes. A few days later, each ADA 
infant received an infusion of his or her own treated 
umbilical cord cells, which now carried both the 
mutant and normal ADA genes. In this case the long- 
range treatment goal was to create a permanent popu- 
lation of ADA-producing cells in the infant's bodies. 
Surprisingly, after only a single gene treatment, the 
immune cells in both children began to produce 
wildtype ADA enzymes, and have continued to do so 
for several years. 


Cystic Fibrosis Disease 


Cystic fibrosis is the most common life-threatening 
genetic disease in the United States, affecting about 
30,000 children and adults. Cystic fibrosis occurs 
when a person inherits two mutant cystic fibrosis trans- 
membrane receptor (CFTR) genes (one from Mom and 
one from Dad). The normal CFTR is a large membrane 
protein that regulates the concentrations of salt ions in 
the cells (see Chapter 10). People with cystic fibrosis 
disease can make only the defective mutant cftr pro- 
teins in their cells, causing chloride ions to build up 
in the cells lining the organs and vessels in the body. 
About 10 million Americans are unknowing carriers 
of a mutant cystic fibrosis gene; they have both the 
wildtype (CFTR) and the mutant (cftr) genes but do 
not suffer disease symptoms. Genetic carriers of cystic 
fibrosis risk having a biological child who inherits two 
copies of the defective cftr gene and becomes sick 
with cystic fibrosis disease, including the risk of serious 
respiratory tract infections and reduced life span (see 
Chapter 10.) 

Although the cftr mutations affect all cells in the 
body, the defect is most damaging to the lung cells 
because the elevated level of the salt ions causes water 
to enter the cells, creating sticky mucus in the airways 
that makes it difficult to breathe. Similar problems in 
cystic fibrosis disease cause severe damage to the pan- 
creas cells. Scientists first focused on developing cystic 
fibrosis gene therapy treatments that would work on 
lung and pancreas tissues. 
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In initial gene therapy tests, scientists took cells from 
the lungs of a cystic fibrosis patient and grew them 
in tissue culture in the lab. They used a viral vector 
to insert the wildtype CFTR gene into the cultured 
human lung and pancreas cells from the cystic fibrosis 
patient (cftr/cftr). Before treatment, the lung cells con- 
tained high levels of salt as expected for tissues from a 
cystic fibrosis patient. After the gene therapy treatment, 
the cultured lung and pancreatic cells expressed the 
wild-type CFTR gene and proteins, and both types of 
cells showed a large decrease in internal salt levels. 

New drugs and therapies for use in treating humans 
are evaluated in a federal system of clinical trials with 
human patients. The approval process involves four 
phases of clinical trials that can take several years to 
complete. Drugs or therapies that successfully pass 
the tests in phases I, II, and Ill are usually approved for 
use by the general public. Extensive preclinical studies 
must be completed before the clinical trials start. Phase 
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| trials begin to test human subjects, usually a group of 
50 to 80 healthy volunteers. The phase | trials are not 
designed to test how well the drug works as a treat- 
ment for a disease. Phase | trials are designed to test the 
safety of the drug and also to determine the appropriate 
therapeutic dose of the drug. After the drug passes the 
safety tests in phase | trials, the phase II trials involve 
larger groups (20 to 300) of volunteers and patients and 
are designed to assess how well the drug works at the 
therapeutic dose(s). Phase III involves multicenter trials 
with large groups of patients (300 to 3000 or more) that 
are designed to provide a clear assessment of the drug’s 
effectiveness compared to the current standard treat- 
ment. In preparation for the large-scale phase III cystic 
fibrosis gene therapy trial scheduled to begin in 2009, 
scientists wanted to extend the length of time that the 
therapeutic CFTR gene was expressed in the cystic 
fibrosis cells to increase the overall effectiveness of the 
treatment (see Chapter 10). 


Box 11.2 Restoring Sight to Blind Dogs and Humans 


For decades researchers have worked toward the goal of 
treating inherited diseases that cause childhood blindness. It 
took longer than 15 years for scientists to identify the mutant 
gene involved in Leber’s congenital amaurosis (LCA), which 
causes vision to rapidly deteriorate early in life. Interestingly, 
this form of inherited blindness is caused in humans and dogs 
by inheriting the mutant form of either the human or canine 
RPE65 gene (Figure 11.17). The RPE65 proteins made in 
humans and dogs have similar structures and perform similar 
functions in their respective retinal cells. 

In humans and dogs, the mutations that alter the RPE65 
proteins invariably damage vision and can cause complete 
blindness. This reflects the critically important role of the 
RPE65 protein in recycling retinol, a molecule that is required 
for the retinal cells to capture light. The image seen by the 
eye is interpreted by the visual center of the brain not by the 
cells in the retina. The discovery that mutations in the RPE65 
gene causes dogs and humans to go blind was a very impor- 
tant step because it allowed researchers to test potential treat- 
ments on the blind dogs before using the therapy on humans 
with LCA. 

Scientists at the University of Pennsylvania used gene 
therapy to treat blind dogs with LCA that had inherited the 
mutant canine RPE65 genes. The eye was ideal for the first 
gene therapy treatments because the retina is easily accessible 
for injections to deliver vectors and genes to the target cells 
(Figure 11.18). For the LCA gene therapy test on dogs, the 
researchers injected the normal, wildtype RPE65 gene DNA 
directly into the eyes of dogs that had been blind since birth. 
Amazingly, after the treatment with the wildtype RPE65 
gene the dogs could successfully navigate their surroundings 
without difficulty. 
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FIGURE 11.17 Mutations in the RPE65 gene cause loss of vision 
in humans and dogs. (A) A retinal fundus photograph from a 
Briard breed of dog with an inherited eye disease. (B) Lancelot, 
a Briard-mix dog, was born blind but his vision was restored by 
gene therapy that provided wildtype (normal) RPE65 gene and 
protein. Gene therapy involving dogs holds out hope for curing 
blindness in people with LCA. 


The dogs’ behavior implied that transfer of the wildtype 
RPE65 gene into the retina cells had restored the dog’s vision, 
but it was important for the scientists to confirm this result 
by finding with evidence that the dogs’ brains had actually 
responded to having restored sight. Using functional mag- 
netic resonance imaging (MRI) the scientists discovered that 
the RPE65 gene therapy treatment dramatically changed how 
the dogs’ brains responded to light and the gene treatment had 
restored function to the visual center of the brain in a dog that 
was blind since birth (Figure 11.19). The recovery of visual brain 
function persisted in one dog for at least two-and-a-half years 
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Box 11.2 continued 
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RETINAL GENE THERAPY 
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FIGURE 11.18 Gene therapy to treat inherited blindness. Eye diseases are good candidates for gene therapy treatments because doctors have 
relatively easy access to the eye for direct application of the therapeutic gene DNA to the target cells that require the gene therapy treatment. 
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FIGURE 11.19 MRI shows activity in visual region of blind dog’s 
brain. Activity in a dog’s brain is compared before and after gene 
therapy treatment for the eye disease LCA. Functional MRI meas- 
ures the activity of the brain cells in the part of the dog’s brain 
involved in vision. After gene therapy treatment for LCA, the pre- 
viously blind dogs could maneuver through a maze. Evidence 
from functional MRI studies supported the conclusion that dogs 
blind since birth could see after the LCA gene therapy treatment. 


Parkinson’s Disease 


More than 500,000 Americans have Parkinson’s dis- 
ease, a nerve disorder in the brain that is caused by a 
decrease in dopamine, a key neurotransmitter in the 
brain. Neurotransmitters propagate nerve signals across 
the open spaces between two adjacent neurons called 


after gene therapy. These amazing studies indicate that blind- 
ness in infancy does not permanently alter the structure and 
function of the blind brain. When the retina can again detect 
light, the brain can properly process the information and restore 
vision. 

Based on the success of animal studies on Leber’s con- 
genital amaurosis (LCA), the first human gene therapy trial 
to treat inherited blindness due to LCA is now under way. 
A team in London injected the wildtype RPE65 gene (carried 
on a viral vector) directly into the retinal cells of 12 people 
who were losing their vision due to inherited LCA. The doc- 
tors injected the normal RPE65 gene DNA directly into the 
retinal cells, which produced wildtype RPE65 proteins to 
compensate for the defective protein caused by the inherited 
mutant RPE65 gene. A similar gene therapy strategy could 
potentially be used to treat about 100 different inherited dis- 
eases that affect vision. Based on the success of LCA animal 
studies, scientists are hopeful that this therapy will also work, 
but it will be some time before the results of this human gene 
therapy trial are known. 


synapses (Figure 11.20). Without sufficient dopamine, 
nerve signal transmission is interrupted at the syn- 
apses and the neurological symptoms characteristic 
of Parkinson’s disease appear, including tremors and 
rigidity in the limbs, slow movements, impaired bal- 
ance, and poor physical coordination. Medications 
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FIGURE 11.20 Neurotransmitters carry impulses across the gap 
between adjacent nerves. (A) The diagram shows two nerve cells 
that send signals to each other over the synapse or gap between the 
two nerves. (B) When the nerve impulse reaches the end of the first 
nerve, neurotransmitters form and are released into the gap between 
the nerves. The neurotransmitters carry the signal across the synapse 
gap to the next nerve cell (nerve impulse is traveling left to right). 


that replace or mimic dopamine are available to treat 
Parkinson’s disease, but relief is temporary because the 
medications lose effectiveness and the Parkinson’s dis- 
ease symptoms get progressively worse. 

Individuals with Parkinson’s disease have extremely 
overactive nerve cells in the walnut-size subthalamic 
nucleus in the brain, causing further decrease in 
the production of dopamine. The first human gene 
therapy trial to treat Parkinson’s disease was conducted 
by a scientific team from Cornell University, which 
constructed a virus vector carrying a gene for glutamic 
acid decarboxylase (GAD). In normal brain cells, 
the GAD enzyme produces gamma-amino butyric 
acid (GABA), a neurotransmitter that naturally inhib- 
its overactive nerves in the subthalamic nucleus in the 
brain. 

Nathan Klein was a 59-year-old TV producer 
living in New York City who suffered from debilitat- 
ing Parkinson’s symptoms; his voice was weak, his 
gait was unsteady, and his hands trembled badly. 
In 2003, Klein entered the first human gene therapy 
trial for Parkinson’s disease and a year later his neuro- 
logical symptoms had greatly improved. Gene therapy 
for Parkinson’s disease takes advantage of the ability to 
directly access and treat the region of the brain affected 
in Parkinson’s patients, the subthalamic nucleus. The 
scientists inserted thin tubing through a quarter-sized 
hole in the top of Klein’s skull and threaded it down 
into the interior of his brain (Figure 11.21). A vector 
carrying the GAD gene DNA was administered through 
the tubing directly into the cells in the subthalamic 
nucleus in Klein’s brain. Expression of the GAD gene 
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FIGURE 11.21 First human gene therapy to treat Parkinson's dis- 
ease. A thin tube is threaded through a hole in the skull, deep into 
the brain to contact the subthalamic nucleus in this Parkinson’s 
patient. A virus vector carrying the GAD gene is injected into the 
brain through the tube. The treated brain cells express the GAD pro- 
tein and produce GABA, which calms the excess nerve activity in 
Parkinson’s patients. 


produced the GABA neurotransmitter, which inhibited 
the overactive nerves in the subthalamic nucleus and 
relieves Klein’s symptoms. Following the gene therapy 
treatment, Klein’s neurological symptoms gradually 
improved and the 12 patients who also participated in 
the first human Parkinson’s gene therapy trial reported 
similar results. 


Sickle Cell Disease 


Sickle cell disease is a serious genetic blood disorder 
that affects predominantly 1 in 500 African Americans 
and Hispanic Americans. Sickle cell was the first human 
disease for which the gene mutation, a single DNA base 
pair change in the B-globin gene, was characterized at 
the molecular level. This seminal discovery was pub- 
lished in 1957, yet the goal of establishing a routine 
gene therapy approach for the treatment of sickle cell 
disease remains elusive even today. 

The “sickle cell” mutation in the 8-globin gene dra- 
matically changes the structure of hemoglobin, a criti- 
cal component of red blood cells that is required to 
transport oxygen from the lungs to other parts of the 
body (see Chapter 10). The sickle cell mutation alters 
an amino acid in the 6-globin protein, which in turn 
causes the mutant hemoglobin complexes to stick 
together and form stiff fibers that distort the physi- 
cal shape of the cell into the sickle-shaped red blood 
cells characteristic of sickle cell disease (Figure 11.22). 
The painful, often debilitating symptoms of sickle cell 
anemia occur because the sickle cells get stuck in the 
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FIGURE 11.22 Oxygen is carried by hemoglobin in red blood cells. (A) Normal red blood cells have a characteristic frisbee shape, which 
allows the cells to easily navigate even the smallest blood vessels, called capillaries. The red blood cells contain hemoglobin, which binds to 
oxygen, permitting the red blood cells to carry oxygen throughout the body. (B) Sickle red blood cells exhibit a dramatic sickle shape caused 
by a single base pair mutation in the B-globin protein gene. (C) Blood cells must squeeze through tiny capillaries to reach extremities in the 
body, a process that is much more difficult for sickle-shaped red blood cells. 
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FIGURE 11.23 Lentivirus vectors carry genes into mouse cells. Lentiviruses are unusual among retroviruses because lentiviruses can infect 
cells that are not dividing (noncycling), whereas most other retroviruses can only infect cells that are actively undergoing cell division. 
(A) Lentiviruses attach to cell surface. (B) Diagram of Lentivirus protein capsid structure. (C) Lentivirus virions. 


smallest blood vessels and block the normal flow of 
blood to tissues and organs (see Chapter 10). 

In 2001, scientists developed a gene therapy strategy 
that successfully corrected sickle cell disease in mice, 
suggesting that gene therapy might also successfully 
treat sickle cell disease in humans (Pawliuk et al. 2001). 
Scientists used a lentiviral vector to deliver wildtype 
B-globin genes into the mouse cells expressing the 
mutant B-globin protein (Figure 11.23). The lentivirus 
family of retroviruses includes the human immunodefi- 
ciency virus (HIV), which causes AIDS. But the lentiviral 
vectors have been carefully modified for safety; they 


cannot replicate in human cells or cause disease, but 
they can efficiently transfer genes to a broad range of 
different human host cells and tissues. Lentiviruses are 
unusual among retroviruses because they can infect 
cells that are not dividing (noncycling cells), whereas 
other retroviruses can infect only cells that are actively 
undergoing cell division. 

Over the past three decades, biotechnology com- 
panies have developed many commercially available 
“kits” designed to help scientists produce appropriate 
vectors for use in biomedical research. One example is 
the Lentiviral Expression System kit, which is available 
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to help scientists propagate lentivirus vector particles 
carrying the therapeutic gene of choice for a given gene 
therapy treatment or clinical trial. The scientist provides 
the DNA encoding the therapeutic gene, but the kit pro- 
vides the vector and the additional components needed 
to help the cells grow and package the vector into viral 
particles. Excellent kits are available to construct plas- 
mid or viral vectors, to sequence DNA by chain ter- 
mination reactions, to amplify DNA using polymerase 
chain reaction (PCR), and for many other techniques in 
molecular biology (see Chapter 5). 


RNAi: THE FUTURE OF GENE THERAPY? 


In the early years of the twenty-first century, scientists 
developed a powerful new approach called RNA inter- 
ference (RNAi), which became an essential, ubiquitous 
tool in biomedical research. One cancer researcher 
called RNAi a transforming technology. “You can’t 
do a [genetic] experiment without doing RNAi.” In a 
remarkably short time, RNAi grew beyond being one 
of the top-10 basic science breakthroughs in 2002 
and 2003 to become the focus of clinical trials to test 
RNAi-based drugs in treating genetic disease. 

Before scientists can develop a treatment for a spe- 
cific genetic disease, they must first investigate the 
wildtype and mutant forms of the gene and understand 
the structural characteristics and functions of the pro- 
tein that causes the disease symptoms in question. To 
determine the function of an unknown gene, scien- 
tists often observe cells that have had a specific gene 
removed from the genome (a gene knockout) or have 
had transcription of that specific gene turned off (gene 
is not expressed as RNA). The ability to silence genes 
using gene knockouts or RNAi is a useful research 
approach that permits scientists to turn off expres- 
sion of a specific gene at will. Then the scientists can 
observe what happens to the cells when the protein 
encoded by the silenced gene is absent. 


An essential gene encodes a protein that performs a func- 
tion required for the cell to live. A knockout of an essential 
gene is usually fatal because the protein with an essential 
function is not produced. RNAi technology allows scien- 
tists to attenuate or fine-tune the decrease in expression of 
a particular gene at the mRNA level. 


RNAi Security: Slicing and Dicing RNA in 
Cells 


Research teams led by Andrew Fire and Craig Mello 
conducting research on nematode worms (C. elegans) 
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FIGURE 11.24 RNAi scientists win a Nobel Prize. Dr. Andrew Fire 
and Dr. Craig Mello were awarded the Nobel Prize in 2006 for their 
research on RNA interference (RNAi). 


discovered a novel cellular process called RNA inter- 
ference that “silences” or “turns off” the expression of 
selected genes in living cells. RNAi gene silencing is 
an intriguing and important biological mechanism for 
which Fire and Mello won the Nobel Prize in 2006 
(Figure 11.24). The natural RNAi process has been 
adapted for use in many applications of basic research, 
biotechnology, and potential medical treatments. At first 
the researchers thought that they were studying a phe- 
nomenon in worms, but then they discovered that RNAi 
is a biological process essential to all eukaryotic cells. 

In wildtype cells, RNA interference (RNAi) shuts 
off the expression of selected genes, whether the RNAi 
mechanism is part of a normal cellular defense process 
or co-opted as a gene therapy strategy (Figure 11.25). 
The natural process of RNA interference (RNAi) in 
cells is triggered by the presence of short interfering 
RNA (siRNA) molecules in the cells, usually produced 
by the destruction of invading double-stranded viral 
RNA genomes. Fire and Mello designed siRNAs that 
when added to cells will target the mRNA from a spe- 
cific gene for cleavage. The siRNAs bind together with 
cellular proteins to make the RNA-induced silencing 
complex (RISC) (see Figure 11.25). RISC guides the 
siRNA to its target messenger RNA (mRNA) in the cell 
and the RISC-siRNA complex base pairs to the target 
mRNA. This is a key step in the degradation of the 
selected target mRNA, because it immediately shuts 
down translation of the target mRNA and expression of 
the protein encoded by the mRNA. 

Researchers used the RNAi strategy to design spe- 
cific siRNA molecules that can be used as therapeutic 
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FIGURE 11.25 RNA interference mechanism (RNAi) (1) dsRNA is 
cut into short interfering RNAs (siRNAs) by Dicer enzyme, (2) RNA- 
induced Silencing Complex (RISC) binds to siRNA, (3) siRNA-RISC 
base pairs to target mRNA, and (4) the target mRNA is cut into non- 
functional fragments. 


“drugs” and are engineered to destroy only certain 
targeted mRNAs, for example, the mRNAs produced 
by expressing a harmful mutant gene. The ability of 
RNAi to target the mRNA transcripts made from a 
specific gene without affecting the RNAs transcribed 
from other genes is a very powerful feature of RNAi 
technology. 


Researchers engineer the siRNAs to base pair to a short 
sequence in the target mRNA, shutting down expression of 
the proteins encoded by the target mRNAs; RNAi can tar- 
get and degrade invading viral RNAs, or it can be used to 
target mutant genes that cause genetic diseases. 


Human siRNA Therapy: Treatment for 
Macular Degeneration in the Human Eye 


The ability of scientists to use RNAi technology to 
silence selected genes in living cells has made RNAi an 
increasingly valuable tool for the treatment of human 
diseases. Some genetic diseases are caused by the inap- 
propriate expression of genes that are normally held 
silent (are not transcribed) by control mechanisms in 
the cell. RNAi suppression of these harmful genes has 
the potential to have a large impact on the future treat- 
ment of human diseases. RNAi technology promises 
to permit doctors to unambiguously target a specific 
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disease-causing gene and silence that gene in the 
human body, all without interfering with the expression 
of other human genes. 

Cancer cells have the ability to activate the bio- 
logical process of angiogenesis, the growth of nearby 
blood vessels, by releasing the vascular endothelial 
growth factor (VEGF) protein (see Chapter 9). Cancer 
cells require access to new blood vessels in order for 
them to continue to divide and grow into a tumor. 
New RNAi-based cancer drugs were developed with 
the goal of using RNA interference to shut off expres- 
sion of the VEGF gene in cancer cells and, as a result, 
cut off the blood supply, starving the cancer cells. 

Recently RNAi was used to treat age-related 
macular degeneration (AMD), which impairs the vision 
of more than 1.5 million adults over the age of 50 in 
the United States. AMD is an eye disease that destroys 
central vision by damaging the macula, the part of the 
retina that lines the inside of the eye. Macular degen- 
eration causes the excessive growth of blood vessels in 
the retina, resulting in swelling and inflammation that 
progressively interfere with central vision. In 2004, 
Sirna Therapeutics began a clinical human trial to test 
an RNAi-based treatment for AMD. Scientists used 
RNAi technology to turn off expression of the VEGF 
gene in the retinal cells of people with AMD. The 
siRNA “drug” used in the AMD trial was a short 
RNA (siRNA) molecule containing an RNA sequence 
designed to base pair specifically with a short seq- 
uence of the mRNA encoding the VEGF protein. 

The phase | AMD gene therapy trial tested, specific 
siRNAs that base pair to the VEGF mRNA sequence as 
RNAi drugs designed to destroy only the targeted VEGF 
mRNAs in the cells. The siRNA drug was injected into 
the patients’ eyes to deliver the siRNA drug directly to 
the cells in the retina. Although this phase | trial was 
designed to look for possible side effects from the use 
of the siRNA drugs and was not meant to measure the 
effectiveness of the RNAi treatment, as many as 25% of 
the AMD patients in the phase | trial reported improve- 
ment in vision, and the vision of the remaining patients 
stabilized and did not worsen (Figure 11.26). 

Many additional RNAi-based human gene therapy 
trials are currently in the early stages of human trials. 
Other human diseases are candidates for siRNA clinical 
trials, including a new approach to genetically reverse 
sickle cell disease using stem cell-based gene therapy 
combined with RNA interference. Sickle cell disease 
can be cured by a bone marrow transplant to transfer 
the healthy blood-forming stem cells from a biological 
relative or close genetic match to the patient. However, 
this option is not available to most patients because of 
the difficulty in finding a compatible donor, especially 
for minority patients due to the smaller number of 
available tested donors. 
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FIGURE 11.26 RNAi gene therapy for macular degeneration. Age-related macular degeneration is a common cause of vision problems in 
older people, often beginning with loss of central vision. (B) Normal vision is shown in (A). The VEGF protein promotes the growth of new 
blood vessels in the retina cells in the back of the eye. The gene therapy strategy to treat age-related macular degeneration is based on the 
ability of siRNA technology to silence specific genes, in this case the VEGF gene in the retina cells. 


ENZYME REPLACEMENT THERAPY: AN 
ALTERNATIVE TO GENE THERAPY? 


Some genetic diseases can be treated successfully 
with appropriate doses of the wildtype enzyme pro- 
tein (not the gene). This enzyme replacement therapy 
(ERT) is a strategy that would potentially bypass the 
need for gene therapy treatment in some cases. For 
patients with adenosine deaminase deficiency (ADA, 
SCID), ERT is a possible option because PEG-ADA is 
an effective treatment for SCID. The PEG-ADA drug 
was created by linking purified ADA enzyme proteins 
to a nonmetabolizable carrier made from polyethylene 
glycol (PEG). The PEG-ADA drug provides the patient's 
cells with the active ADA enzyme, without many side 
effects. Although this is a promising treatment, the 
PEG-ADA drug is extremely expensive and at this time 
it must be taken by injection for life. 

Another candidate for ERT is Gaucher’s disease 
(GD), a genetic lysosomal storage disease that causes 
a genetic defect in the cellular lysosomal pathway. In 
healthy cells, fatty molecules called glucocerebrosides 
are degraded by special glucocerebrosidase enzymes 
carried inside the lysosome vacuoles (Figure 11.27). 
People with Gaucher's disease have a mutation in the 
gene for the glucocerebrosidase enzyme, so that the 
glucocerebrosides are not degraded properly but instead 
these toxins accumulate in the spleen, liver, and bone 
marrow, causing serious problems such as anemia; 
bone, liver, and spleen damage; and neurological defi- 
cits. In samples taken from Gaucher's disease patients, 
the macrophages stain dark blue because of the accu- 
mulation of excess glucocerebrosides (Figure 11.28). 

ERT requires a commercial source of the specific 
enzyme proteins to be used in the treatments. In initial 
tests on ERT, 12 Gaucher's disease patients received 
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FIGURE 11.27  Lysosome pathway in the cell. (1) The RER and 
Golgi are part of the membrane system used by cells to secrete 
newly made proteins out of the cell, or to incorporate the proteins 
into the plasma membrane. (2) Lysosome vacuoles (vesicles) contain 
many different degradation enzymes (proteases, lipases, nucleases, 
and polysaccharidases). 


doses of wildtype glucocerebrosidase enzyme purified 
from rare human placental tissues, and all the patients 
showed dramatic improvements in disease symptoms. 
The success of this approach prompted scientists to 
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FIGURE 11.28 Lysosomes from cells with Gaucher's disease. Gaucher's 
disease is caused by a defective glucocerebrosidase gene and enzyme. 
In lysosomes, glucocerebrosidase enzymes normally catalyze the 
breakdown of sphingolipids. Macrophages from this Gaucher's dis- 
ease patient are swollen and stained dark blue because of the accu- 
mulation of glucocerebrosides in the cells. 


develop an alternative, reliable source of human glu- 
cocerebrosidase enzyme by cloning the glucocerebro- 
sidase enzyme gene into a DNA vector and producing 
large amounts of the glucocerebrosidase protein in 
the bacterial host cells. This approach ensures a future 
source for the large amounts of enzyme needed for 
lifetime ERT treatments for Gaucher's patients without 
relying on a limited supply of human placental tissue. 

Beano is a commercially available enzyme replace- 
ment therapy used to treat individuals with the gas- 
trointestinal disorder called complex carbohydrate 
intolerance (CCI). This disorder affects individuals who 
lack the alpha-galactosidase enzyme that degrades 
complex carbohydrates into sugar building blocks 
(Figure 11.29). Sugars that are not properly digested 
by people with CCI will produce gas and cause pain 
in the abdomen. Beano prevents the symptoms of 
CCI by providing the cells in the body with the miss- 
ing enzyme. This treatment also permits people with 
CCI to receive the health benefits available from foods 
rich in complex carbohydrates. Another over-the- 
counter ERT is Sucraid, which is used to treat sucrase 
deficiency in people who inherit a sucrase-isomaltase 
genetic disease and lack the sucrase enzyme. 


SUMMARY 


Gene therapy is a remarkable approach to curing 
genetic diseases. In a gene therapy treatment, the patient 
receives the wildtype gene that produces the necessary 
functional protein, and as a result the disease symptoms 
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FIGURE 11.29 Over-the-counter enzyme replacement therapy (ERT). 
(A) People who are intolerant of complex carbohydrates (CCI) and can- 
not digest sugars properly suffer abdominal pain and gas. Fortunately, 
CCI symptoms can be prevented by Beano, which provides the cells 
in the body with the missing enzyme and allows people with CCI to 
include complex carbohydrates in their diet. (B) The over-the-coun- 
ter ERT product Sucraid treats sucrase enzyme deficiency, which is 
caused by inheriting a sucrase-isomaltase genetic disease. 


are alleviated. The goal of gene therapy is to rescue 
the genetic defect caused by the mutation by using the 
wildtype gene rather than just treating the disease symp- 
toms. Gene therapy offers great promise for the future 
development of life-saving treatments for patients with 
many different genetic diseases. 

A key part of any gene therapy strategy is the 
choice of vector to carry the therapeutic gene to the 
cells in need of treatment. Many virus genomes have 
been modified for use in gene therapy treatments 
based on the characteristics of the vector and the 
specific requirements of the treatment protocol. Some 
vectors used for gene therapy are derived from retrovi- 
rus genomes, but the vectors have been altered to be 
sure that the vector does not harm human cells. Once 
the therapeutic gene has been inserted into the retrovi- 
rus vector, the vector enters the nucleus of the cell and 
becomes inserted into host cell chromosome, one way 
to achieve long-term gene therapy for the patient. 

Nonviral vectors are becoming more prevalent 
as scientists have combined DNA technology with 
nanotechnology to better approach the challenges of 
gene therapy (see Chapter 13). Researchers continue 
to test other ways to deliver genes into cells, includ- 
ing liposomes, nanoparticles, and inhalation aerosols. 
Safe and effective gene therapy treatments are based 
on the large amount of information available about 
the mechanism of the disease in question. They must 
decide which disease cells are accessible for the deliv- 
ery of therapeutic DNA, without harming other cells 
in the body. Scientists often find that the technologies 
developed to treat one disease can also be useful for 
the treatment of different diseases. 
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The field of gene therapy is controversial, and ethi- 
cal questions continue to be raised. To guard against 
the accidental transmission of altered human genomes 
to the children in the next generation, only nonrepro- 
ductive (somatic) human cells are legal to use in gene 
therapy in the United States. A multilayered review sys- 
tem was established to ensure the safety and efficacy 
of future gene therapy treatments that are reviewed 
and funded through the National Institutes of Health 
(NIH) and the Food and Drug Administration (FDA). 
Gene therapy opponents point out the availability of 
alternative treatments for some diseases, including 
enzyme replacement and bone marrow transplants for 
some blood diseases. Proponents argue that the high 
cost of alternative treatments is not usually covered by 
health insurance, and the success of any bone marrow 
transplant still requires a closely matched donor. 

In this chapter we discussed only a few of the human 
diseases treated with gene therapy approaches, but 
many other diseases, including cancer, are also good 
candidates for gene therapy treatments. Updated infor- 
mation on human genetic studies including the ongoing 
and planned human gene therapy trials (phases |, Il, Ill) is 
available online. The completion of the Human Genome 
Project provided the scientific community—in fact, the 
entire world community—with access to the 20,000 
(or so) gene instructions about how to make humans, 
human. Thousands of these human genes, possibly as 
many as 10,000 single-gene mutations, can potentially 
cause human diseases (see Chapter 10). Clearly, many 
people who inherit these genetic mutations and the 
associated diseases are likely to benefit from the devel- 
opment of safe, effective gene therapy treatments. 

The future potential of gene therapy technology lies 
in a combination of ingenious ideas and new technolo- 
gies that are sufficiently flexible to meet the needs of 
the patient and beat the disease. The explosion of RNA 
technologies, many based on applications of the RNAi 
(siRNA) treatments, has inspired and excited the entire 
scientific community. Our understanding of the way 
genes work (combined with the huge wealth of infor- 
mation offered by the human genome sequences and 
genomic research) is finally poised to benefit everyone. 


REVIEW 


In this chapter we reviewed the current and future 
prospects for different human gene therapy treatments. 
To test your comprehension of the chapter's contents, 
answer the following questions: 


1. What types of vectors are used for gene therapy 
experiments, and how do scientists decide which 
vector to use in a particular gene therapy case? 
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2. Describe the first successful experiment involving 
gene therapy for individuals with ADA deficiency. 

3. What are the main differences between in vivo 
and ex vivo gene therapy methods? 

4. Summarize how gene therapy can be used to treat 
cancer. 

5. What are the major advantages and disadvantages 
associated with using RNAi for gene therapy? 

6. What is the genetic basis of cystic fibrosis, and 
what intervention can be made with gene therapy? 

7. Describe the main characteristics of a nonviral 
gene delivery system. 

8. Explain why “Kill the messenger RNA” is an 
appropriate nickname for the RNA interference 
(RNAi) process. 

9. How do scientists decide what vector to use in a 
particular gene therapy? 

10. Explain the risks associated with gene therapy 
treatment. 
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Windpipe Transplant Breakthrough 


BBC News, http://news.bbc.co.uk/2/hi/health/7735696.stm 

By Michelle Roberts 

Surgeons in Spain have carried out the world’s first tis- 
sue-engineered whole organ transplant—using a windpipe 
made with the patient's own stem cells. The groundbreak- 
ing technology also means for the first time tissue trans- 
plants can be carried out without the need for antirejection 
drugs. The patient, 30-year-old mother-of-two Claudia 
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Castillo, needed the transplant to save a lung after con- 
tracting tuberculosis. The Colombian woman's airways 
had been damaged by the disease. Scientists from Bristol 
helped grow the cells for the transplant and the European 
team believes such tailor-made organs could become the 
norm. To make the new airway, the doctors took a donor 
windpipe, or trachea, from a patient who had recently 
died. Then they used strong chemicals and enzymes to 
wash away all of the cells from the donor trachea, leav- 
ing only a tissue scaffold made of the fibrous protein col- 
lagen. This gave them a structure to repopulate with cells 
from Ms Castillo herself, which could then be used in an 
operation to repair her damaged left bronchus—a branch 
of the windpipe. By using Ms Castillo’s own cells the doc- 
tors were able to trick her body into thinking the donated 
trachea was part of it, thus avoiding rejection. 

Two types of cell were taken from Ms Castillo: cells 
lining her windpipe, and adult stem cells—very immature 
cells from the bone marrow—which could be encouraged 
to grow into the cells that normally surround the windpipe. 
After four days of growth in the lab in a special rotating 
bioreactor, the newly-coated donor windpipe was ready 
to be transplanted into Ms Castillo. Her surgeon, Professor 
Paolo Macchiarini of the Hospital Clinic of Barcelona, 
Spain, carried out the operation in June. Five months on 
the patient, 30-year-old mother-of-two Claudia Castillo, is 
in perfect health, The Lancet reports. 


LOOKING AHEAD 


This chapter gives an overview of the exciting and 
dynamic study of stem cell biology and introduces the 
key concepts necessary to understand this fast-moving 
field while providing the reader with a working knowl- 
edge of this interesting subject. All stem cells have the 
same defining characteristics, but there are many dif- 
ferent kinds of stem cells, perhaps as many types of 
stem cells as there are types of tissues. This chapter 
reviews the basic concepts about these amazing cells 
and updates the status of work on the most widely 
studied stem cells. 
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On completing this chapter, you should be able to 
do the following: 


e Provide the functional definition of a stem cell listing 
the two most important features. 

e Know the major differences between multipotent 
stem cells and pluripotent stem cells and how they 
are related to each other. 

e Explain the essential functions of embryonic stem 
cells during embryo development. 

e Understand the most important differences among 
adult skin cells, adult stem cells, and embryonic 
stem cells. 

e Describe the advantages and disadvantages of using 
adult stem cells or embryonic stem cells for the 
development of stem cell therapies. 

e Describe the advantages of someday having access 
to patient-specific blastocyst embryos. 

e Explain the origin of induced pluripotent stem 
(iPS) cells and why they represent a very important 
advance in the field of regenerative medicine. 

e Describe the basic features of genome reprogram- 
ming and the role of epigenetic changes in the cell. 


INTRODUCTION 


Stem cells are an amazing type of cell that have cap- 
tured the fascination of scientists and the public alike. 
Stem cells have been studied for longer than 50 years. 
Treatments such as bone marrow transplants have 
become routine for some diseases, but it was not until 
1998, when scientists described the origin of human 
embryonic stem cells, that stem cells became a major 
topic of discussion and debate in the research com- 
munity, the halls of Congress, and at the kitchen table. 


FIGURE 12.1 
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Human embryonic stem cells hold great promise for 
advances in basic research and have enormous poten- 
tial for the development of novel therapies to treat 
many degenerative diseases. 

Despite their enormous potential, the biologi- 
cal source of embryonic stem cells (ESCs), human 
embryos, has made research and the development of 
human treatments extremely controversial (Figure 12.1). 
President H. W. Bush entered the debate in 2001 at the 
start of the explosion of stem cell research when he 
banned all federally funded research on human embry- 
onic stem cells in the United States. This stalled public 
research on stem cells but promoted an increase in pri- 
vate, corporate sources of support for stem cell research 
in the United States and abroad. It also increased 
research to develop new ways to generate embryonic 
stem cells without using human embryos. 


Human embryonic stem cells hold great promise for the 
development of novel therapies to treat many diseases. 
Despite such amazing potential, the fact that the ESCs 
cells come from human embryos has made this research 
extremely controversial and led to the development of new 
methods to avoid the use of human embryos. 


STEM CELLS GENERATE NEW TYPES OF 
SPECIALIZED CELLS 


The field of stem cell research was launched in 
the early 1960s by scientists James Till and Ernest 
McCulloch who, for the first time, discovered that 
stem cells in mice give rise to the many different spe- 
cialized mouse cells in the blood and immune sys- 
tems. In the early stages of development, mammalian 
embryos, including mouse and human embryos, form 


x Red blood cells 


——— 


Heart muscle 


Embryonic stem cells (ESCs) have the most developmental potential. Embryonic stem cells (ESCs) grow in the inner cell mass 


(ICM) inside the hollow blastocyst embryo (visible cross-section of blastocyst embryo). ESCs are vitally important because they have the 
potential to generate all of the different types of specialized cells needed in the embryo and to develop an adult human. When the ESCs are 
used to generate specialized cells for tissue replacement therapy, the blastocyst embryo is destroyed. 
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hollow balls of cells called blastocysts (Figure 12.2). Till 
and McCulloch were studying the bone marrow cells in 
mice, looking for components in the bone marrow that 
might rescue mice that were previously exposed to lethal 
doses of radiation. The radiation destroyed the cells in 
the bone marrow of the irradiated mice, but these sci- 
entists found that the doomed mice could be saved by a 
bone marrow transplant from nonirradiated mice. 

Till and McCulloch analyzed the bone marrow cells 
from the nonirradiated donor mice to look for specific 
factors that could account for the regenerative potential 
of the nonirradiated blood. They discovered that only one 
type of transplanted cell could regenerate healthy cells 
in the irradiated mice, the stem cells that also give rise 
to the blood cells. For their fundamental work on stem 
cells, Ernest McCulloch and James Till won the Lasker 
Award for Basic Medical Research in 2005. Even today, 
scientists judge the developmental potential of putative 
stem cells using the standards of self-renewal and multi- 
lineage differentiation set by Till and McCulloch. 


Stem Cells Have Developmental Potential 


Stem cells are categorized as either pluripotent or 
multipotent cells based on their ability to grow and 


(B) 
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reproduce (self-renewal) and their potential to generate 
different types of cells by cell differentiation (Figure 
12.3). The stem cells with the highest potential to 
generate all types of specialized cells grow inside the 
blastocyst. These pluripotent stem cells are called 
embryonic stem cells (ESCs) (Figure 12.4). ESCs exist 
temporarily in mammalian embryos at a certain time 
during very early development when the embryo 
begins to make cells with more specific functions. 
Eventually the ESCs in a human embryo will generate 
all of the 200 or so different types of cells required to 
grow a human body. The underlying promise of stem 
cell research is the potential to use the amazing ESCs 
to develop new and powerful cell replacement treat- 
ments for many human diseases and disorders (Figure 
12.5). Cell and tissue replacement therapies involve 
replacing the damaged or diseased tissues much like 
organ transplantation is designed to replace diseased 
or damaged organs. 

ESCs are pluripotent cells that can generate 
multipotent stem cells, which have far less develop- 
mental potential than the ESCs and develop into far 
fewer types of specialized cells. Multipotent stem cells 
are partially specialized or partly differentiated stem 
cells, which are produced in both adult and embryonic 


(F) 


FIGURE 12.2 Blastocyst embryos contain the infamous embryonic stem cells (ESCs). (A) Cross section diagram of a blastocyst shows that 
the embryo is a hollow ball with the ESCs growing inside. (B) Image shows the surface topology of an intact blastocyst embryo, stained to 
indicate the individual cells in the embryo. (C) The small number of human ESCs (green cells) can be seen growing inside a hybrid develop- 
ing mouse embryo (blue cells) (D) Blastocyst embryo from a Cheetah; (E) Human blastocyst shows each cell nucleus stained pink; (F) This 
blastocyst was made from a new approach using four transcription factor genes to reprogram ordinary adult skin cells and convert them into 
induced pluripotent stem (iPS) cells. 
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FIGURE 12.3 ESCs generate other important stem cells. During embryo development the ESCs give rise to other stem cells, which are the pre- 
cursors that produce the more specialized cells required by the adult body. For example, the ESCs in the embryo generate the hematopoietic 
stem cells in bone marrow that give rise to the red and white blood cells in the adult human. Other progenitor stem cells give rise to muscle, 


nerve, bone and other specialized tissues. 


FIGURE 12.4 A human embryonic stem cell. This color-enhanced 
scanning electron micrograph (SEM) shows a human embryonic 
stem cell (gold) growing outside of the blastocyst on top of a layer 
of flat fibroblast cells. In the lab, the fibroblast cells provide factors 
that support the growth of undifferentiated ESCs. ESCs can generate 
any type of specialized cell in the human body. 


tissues. Multipotent adult stem cells are generated 
when pluripotent embryonic stem cells differenti- 
ate to form more specialized cells (see Figure 12.2). 
Multipotent stem cells are also called “somatic” stem 
cells or “tissue-derived” stem cells and are further iden- 
tified either by the tissue from which they were isolated 
or by the specialized cells that they become when ter- 
minally differentiated. For example, the “neural stem 
cells” give rise to highly specialized nerve cells in spe- 
cific areas of the adult human brain. The primary differ- 
ence between pluripotent and multipotent stem cells 
is that pluripotent stem cells have a broader develop- 
mental repertoire and can generate cells that contribute 


to all three major germ layers required for embryo 
development: the ectoderm, endoderm, and mesoderm 
(Chapter 9). Multipotent stem cells have a much more 
limited potential to differentiate into specialized kinds of 
cells. 

The ability of stem cells to execute asymmetric cell 
division is fundamental to the mechanism that permits 
a stem cell to acquire new and specialized functions 
by differentiation (see Chapter 9). When a stem cell 
begins to divide by cell division, it must follow one 
of two major paths, either to divide symmetrically to 
produce two identical cells (self-renewal) or to divide 
asymmetrically to yield one cell identical to the par- 
ent cell and a second cell that has specialized and 
exhibits different traits (Figure 12.6). The daughter 
cells that result from such an asymmetric cell divi- 
sion event contain the same genes as the parent stem 
cells, but they have different developmental potentials. 
Epigenetic changes in the genome of the daughter cell 
alter gene expression patterns but do not change the 
DNA sequence of the genome. 

Embryonic stem cells have the highest potential to 
generate all types of specialized cells in the body. To 
develop successful regenerative therapies involving 
human stem cells, it is imperative to understand how 
stem cells are generated, sequestered, and nurtured 
in the body. The stem cell niche refers to the imme- 
diate environment surrounding stem cells in the body, 
which provides physical support, supplies nutrients, 
and transmits molecular and chemical signals among 
the cells. These signals control the development of the 
stem cells and influence the differentiation of stem 
cells into the required types of cells at the appropriate 
time. The stem cell niche is a dynamic environment 
supported by components of the extracellular matrix, 
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FIGURE 12.5 The promise of stem cell research. The underlying promise of stem cell research is based on the amazing potential of ESCs not 
only to develop new and powerful cell replacement treatments but also for use testing drugs and toxins, to better treat and prevent birth defects. 
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FIGURE 12.6 Asymmetric cell division generates differentiated cells. (A) When stem cells undergo mitotic cell division, they can divide 
symmetrically so that each dividing cell produces two identical progeny cells (self-renewal). (B) Alternatively the cells can divide asymmetri- 
cally in which case the dividing cell will produce one identical stem cell and one differentiated cell that has undertaken a different develop- 


mental pathway. 


which includes various physiological metabolites, 
growth factors, and other hormones made by the stem 
cells and their progeny cells. 


Diminishing Potential with Increasing 
Specialization 


At some point during development, stem cells com- 
mit to becoming a specific type of specialized cell in 
the adult body. As stem cells move through the differ- 
entiation spectrum from the undifferentiated fertilized 


egg toward a more fully differentiated and mature state, 
each cell can generate fewer and fewer different types 
of cells. When the cell becomes terminally differenti- 
ated it will adopt a final form and function. 


Two biological characteristics define all stem cells: (1) 
stem cells can produce identical cells by cell division 
(self-renewal), and (2) stem cells can cease cell division 
and differentiate into multiple types of cells depending on 
developmental potential. 
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THE POTENTIAL AND THE PROBLEMS OF 
EMBRYONIC STEM CELLS 


In the developing embryo, ESCs grow in the inner cell 
mass (ICM) of very early, blastocyst-stage embryos (see 
Figure 12.2). As the embryo continues to develop, the 
ESCs differentiate into the cells that will eventually give 
rise to all of the cell types in the human embryo. The first 
report of ESCs isolated from human embryos sparked 
a great deal of discussion and debate worldwide. This 
research caused huge excitement in the scientific com- 
munity and fueled speculation about the future potential 
for human ESCs to play a major role in treating diseases, 
injuries, and the effects of aging in humans. 

Embryonic stem cells were first isolated from mice and 
not humans, but since then, the ESCs from lab mice have 
laid the groundwork for research on human ESCs. Much 
later the mouse ESCs were used to produce gene-targeted 
knockout mice, an advance that profoundly changed the 
nature of biomedical research (discussed later). 


Embryonic Stem Cells Have the Most 
Developmental Potential 


In early development at the blastocyst-stage, a human 
embryo resembles a hollow ball containing two main 
types of cells, the outer layer trophectoderm, which 
gives rise to the placenta, and the inner cell mass 
(ICM), which gives rise to the cells in the embryo itself 
(Figure 12.7). Inside the blastocyst, a small number 
of ESCs are generated from cells in the ICM. These 
pluripotent ECSs differentiate into multipotent stem 
cells, which in turn develop into the tissue layers and 
the germ cells of the embryo itself. 

Although the ICM is only transiently present in the 
developing early embryo, once the ESCs have been 
established in the lab culture as embryonic stem cell 
lines, they can often be maintained and reproduced for 
years without losing developmental potential (pluripo- 
tency). Human ESC lines are used for developmen- 
tal research, for toxicology screening assays on new 
drugs, and to generate new cell replacement therapies. 

Clearly the human ESCs have amazing potential for 
the successful development of cell replacement and 
regeneration therapies for human patients. However, the 
fact that these stem cells originate in human embryos 
poses serious ethical and moral concerns that threaten to 
destroy the potential benefits that ECSs offer. As a result, 
intense efforts were initiated to find less-controversial 
alternative sources of pluripotent human ESCs. ESCs are 
the most widely studied type of pluripotent stem cell, but 
to date ESCs have been derived successfully from the 
embryos of only three mammals; mice, nonhuman pri- 
mates, and humans. The ban in 2001 on federal support 
for stem cell research using human ESCs helped to 
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FIGURE 12.7 Embryo cells move during development. (A) The blas- 
tocyst embryo is a hollow-ball of cells (shown in cross-section) that 
contains the inner cell mass, which gives rise to the pluripotent ESCs 
inside the blastocyst. (B) Gastrulation begins with cell migration into 
the hollow ball (arrows) and the formation of the three major cell 
layers in the embryo: the ectoderm, endoderm, and mesoderm. 


encourage efforts to develop alternate sources of 
human stem cells for research and medicine, with some 
surprising successes. 


The first embryonic stem cells were isolated from mice 
not humans. Since then, work on ESCs from mice laid the 
fundamental groundwork for research on human ESCs and 
promoted the investigation of possible applications of ESCs 
to treat human diseases and disorders. 


Human Adult Stem Cells Have Limited 
Developmental Potential 


Most cells in the adult human body are not stem cells but 
are fully (terminally) differentiated into over 200 types of 
specialized cells that perform many important functions 
in the body (Figure 12.8). Examples of fully differentiated 
cells include red blood cells that carry oxygen, epithelial 
cells that line the walls of the small intestine, skin cells 
covering the body, cardiac muscle cells that allow the 
heart to beat, and nerve cells that develop long exten- 
sions that transmit impulses to the brain. 

Although the majority of adult cells are fully dif- 
ferentiated and specialized, the human body also con- 
tains adult stem cells (somatic stem cells) that have 
the power to generate specialized cells to replace 
damaged tissues following injury or disease. Adult 
stem cells are part of a highly regulated system that 
produces new cells in the right tissue at the right time 
and in the right amount to meet the needs of the body 
without causing uncontrolled cell growth or disease. 


Chapter | 12 Stem Cell Research 


Cardiac muscle cells 


Blood cells 


FIGURE 12.8 Adult cells have specialized functions and are fully 
differentiated. Most cells in the human body have differentiated into 
specialized cells that do not divide but perform many specialized 
functions in the body. Examples of fully differentiated somatic cells 
include the red blood cells that carry oxygen, the epithelial cells that 
line the walls of the small intestine, cardiac muscle cells that make 
the heart beat and nerve cells that transmit impulses to the brain. 


Adult stem cells have now been isolated from tissues 
such as brain, bone marrow, peripheral blood, blood 
vessels, skeletal muscle, epithelia cells in skin and 
digestive tract, cornea, dental pulp, retina, and liver. 
The fact that adult stem cells have been found in so 
many different adult tissues suggests that most tissues 
have a specific adult stem cell population that is avail- 
able for limited repair and regeneration of that tissue. 


Adult stem cells are relatively rare multipotent somatic 
stem cells that reside in adult organs and tissues. In the 
human body, the adult stem cells give rise to a small 
number of specialized cell types in the tissue from which 
they were derived. 
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Adult Stem Cells Generate Specialized 
Human Cells 


Adult stem cells are multipotent, which means that 
they have limited developmental potential. Regardless 
of the source, adult stem cells can differentiate only 
into cells of the same type as the tissue from which 
they originated. For example, hematopoietic stem cells 
can generate only blood and immune system cells 
and neural stem cells can generate only specialized 
cells found in the nervous system. Nevertheless, even 
though adult stem cells have limited developmental 
potential, in many cases they present a possible alter- 
native to human ESCs for use in cell and tissue trans- 
plantation therapies. 

Here we will focus on three well-studied types of 
adult stem cells: the hematopoietic stem cells (HSCs), 
which give rise to the different types of specialized 
cells in the blood and immune systems (Figure 12.9), 
the mesenchymal stem cells (MSCs), which produce 
bone and cartilage, and the neural stem cells (NSCs), 
which generate nerves and brain tissues. Although 
adult stem cells reside in the adult, they retain the 
main features of all stem cells, the ability to reproduce 
as undifferentiated cells and the capacity to change 
developmental direction and generate different kinds 
of specialized cell types. 


Adult stem cells are multipotent and have limited devel- 
opmental potential compared to the pluripotent ESCs. Still 
adult stem cells are capable of generating certain types of 
specialized cells in the body and can potentially be used 
in cell transplantation therapies. 


Hematopoietic Stem Cells Generate 
Blood Cells 


Hematopoietic stem cells (HSCs) have been a research 
focus because HSCs generate all of the different types 
of specialized cells in the human blood and immune 
systems. The HSCs are familiar to the public because 
they provide the cells most commonly used in bone 
marrow transplants to treat cancer and other diseases of 
the blood. The traditional sources of HSCs for cell trans- 
plant therapies include the bone marrow in the adult 
long bones and the peripheral blood. More recently, 
the HSCs for transplantation have been obtained from 
umbilical cord blood, which is easily collected at birth 
and can be cryopreserved for long-term storage. Once 
thawed, the cord blood stem cells can be readily grown 
in the lab to provide HSCs with no risk to the donor. 
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FIGURE 12.9 Hematopoietic stem cells produce blood cells. The hematopoietic stem cell (HSC) family tree shows how the HSCs generate 
different types of cells in the blood. The HSCs also generate the lymphoid progenitor cells, which give rise to the T and B lymphocytes and 
NK cells, and the myeloid progenitor cells, which give rise to macrophages, granulocytes, platelets, and erythrocytes (red blood cells). As 
cells become more specialized, they lose some developmental potential. 


Adult stem cells are multipotent somatic stem cells that 
reside in very small numbers in adult organs and tissues. 
In the human body, adult stem cells give rise to a small 
number of specialized cell types in the tissue from which 
they were derived. 


Mesenchymal Stem Cells Generate Bone, 
Cartilage, and Muscle 


The mesenchymal stem cells (MSCs) in the adult bone 
marrow are necessary for the body to generate tissues 
such as bone, cartilage, muscle, ligament, tendon, 
adipose, and bone marrow. The MSCs represent an 
ideal population of adult stem cells for stem cell thera- 
pies because they are comparatively easy to isolate 
from the body, can be grown easily in lab cultures, 
and can be induced to differentiate into lineage-spe- 
cific cell types without becoming contaminated with 
other cell types (Figure 12.10). MSC stem cells can 
be grown for many generations in the laboratory and 
still retain a stable morphology and normal chromo- 
some complement. These properties explain why the 
MSCs are well suited for autologous stem cell trans- 
plants, in which the patient’s own stem cells are used 
to replace damaged or diseased tissues in the patient's 
own body. MSCs taken directly from a patient are 


cultured in the laboratory where they are treated with 
reagents that induce the MSCs to differentiate into spe- 
cific types of specialized cells, which are then trans- 
planted into the patient. This approach eliminates 
the risk that the patient will reject the transplanted 
cells because the same individual is both donor and 
patient. 

Both the HSCs and MSCs develop in the bone mar- 
row, but these two cell types can be easily separated 
from each other in laboratory preparations of bone 
marrow cells because the two cell types have different 
physical properties on the cell surfaces. As a result the 
MSCs attach to the bottom of the plastic culture plates, 
whereas the HSCs remain unattached and float in the 
medium. Proliferating MSCs (stromal stem cells) have 
a fibroblast-like morphology, but they retain the abil- 
ity to differentiate into multiple mesenchymal cell 
lineages (see Figure 12.11). MSCs have been used in 
regenerative stem cell therapies to treat degenerative 
diseases such as arthritis. 


MSC stem cells can be grown for many generations in lab- 
oratory and maintain a stable chromosome number, and 
retain the ability to differentiate into pure populations of 
specific types of bone, cartilage, and muscle cells. 
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FIGURE 12.10 Mesenchymal stem cells make bone and cartilage. (A) Mesenchymal stem cells (MSCs) give rise to three main types of cells: 
(B) osteocytes in bone (the calcium deposits are stained with alizarin red). (C) cartilage cells (proteoglycans are stained with toluidine blue), 


and (D) adipocytes (fat cells are stained with oil red). 


FIGURE 12.11 
neuroglial cells. 


Neural Stem Cells Generate Specialized 
Nerve Cells 


Except for severe damage to the spinal cord, research 
shows that it is not correct to conclude that injured nerve 
cells never regrow. In 1992, for the first time, Reynolds and 
Weiss isolated neural stem cells (NSCs) from the brains of 
adult mice and successfully grew the dividing NSCs in lab 
cultures for long periods of time. This work challenged 


(A) (B) (C) 


Human hNSCs differentiate into specialized neuronal cells. (A) Neuroepithelial stem cells. (B) Oligodendrocytes. (C) Progenitor 


the “no new neurons” dogma and proved to be a major 
breakthrough in the fields of neurobiology and cell regen- 
eration. Remodeling the neural network in the human 
brain and the related process of nerve growth (neurogene- 
sis) continue throughout our lives. Nerves die off and new 
nerve connections form in response to learning, injury, 
disease, the environment, and much more. 

Two main regions of the human brain actively grow 
new nerve cells: the hippocampus, which is important 
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for memory formation, and the ventricular and subven- 
tricular zones, both areas of actively dividing cells in 
the fetal and adult brain. In adult rodents, neurogenesis 
also occurs in the olfactory bulb, the part of the brain 
that receives odor signals from the nose. Scientists still 
do not understand how the body controls the growth of 
neural stem cells or the precise signals needed to initiate 
differentiation into different types of nerve cells. 

Neural stem cells are also called neural progenitor 
cells because they give rise to three lines of specialized 
nerve cells: (1) neurons, of which there are hundreds of 
different types, (2) astrocytes, which support the neu- 
rons, and (3) oligodendrocytes, which make myelin, 
a lipid-rich fatty material that surrounds the neurons. 
Myelin is vital for the proper transmission of electrical 
signals along the nerves and is defective in degenerative 
nerve diseases such as multiple sclerosis (Figure 12.11). 
Remodeling the neural network in the human brain 
includes the death of some nerves and the growth of 
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others throughout a lifetime, and new nerve connec- 
tions form in response to learning, injury, and disease. 

To successfully grow human nerve cells in culture, it 
is necessary to provide an environment that mimics the 
physiological conditions of the human body in the lab, 
including the appropriate temperature, atmosphere, and 
nutrient medium. Cell culture methods usually require 
serum, which is blood plasma without the clotting factors, 
as a nutrient source, but scientists found that prolonged 
exposure to serum prevented the NSCs from differentiat- 
ing into specialized nerve cells. A huge advance came 
with the development of a serum-free culture medium for 
NSC growth, which also contained large amounts of epi- 
dermal growth factor (EGF) and bovine fibroblast growth 
factor (bFGF). The large quantities of hormones needed to 
propagate NSCs in culture are produced by recombinant 
DNA methods. Growth factors are necessary for the NSCs 
to proliferate (divide) and grow as a single layer of cells on 
a surface substrate such as fibronectin (Figure 12.12). 


(C) 


(B) 


FIGURE 12.12 Human neural stem cells growing in culture. (A) The NSCs are growing attached to fibronectin substrate. (B) Cells proliferate 
and begin to aggregate. (C) Further aggregation of cells. (D) Neurosphere induction by growing cells in nutrient restricted medium. 
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When induced to differentiate, some of the NSCs 
detach from the surface of the plate and form multicel- 
lular structures called neurospheres (see Figure 12.13). 
This type of NSC culture can be propagated indefinitely 
in the lab by periodically breaking up the neurospheres 
and dispersing the cells into plates with fresh medium 
for further growth. The NSCs can be induced to develop 
into specific types of neural cells by adding or remov- 
ing specific proteins such as growth factors. The specific 
conditions needed to induce stem cells to differenti- 
ate are derived experimentally by the scientists in the 
research lab and depend on the type of stem cells and 
the type of desired specialized cells. 

The ability to get NSCs to generate specialized neurons 
in the lab is an amazing achievement, but better methods 
are needed to prepare large quantities of nerve stem cells 
for routine use in the treatment of degenerative nervous 
system diseases. Specialized nerve cells are under intense 
study, and scientists continue to develop better methods 
to prepare large quantities of nerve stem cells for use 
treating degenerative nervous system diseases. 

The types of specialized cells generated by differ- 
entiation of NSCs are routinely identified in the lab 
using fluorescent immunocytological staining (see 
Figure 12.13). In this method, the cells are treated with 
specific antibodies that have been raised against the 
different proteins (antigens) that are found on the cell 
surfaces of either neurons, astrocytes, or oligodendro- 
cytes. Because each antibody can recognize and bind 
to a specific protein on the cell surface, the different 
types of nerve cells in a population of cells can be eas- 
ily identified by color (see Figure 12.11). 

Highly purified preparations of neural stem cells 
can be derived from both cultured mouse and human 
ESCs by following established research protocols. Both 
mouse and human ESCs have been used to generate 
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specialized neural cells that were then used to suc- 
cessfully restore partial function in test animals with 
spinal cord injuries. 


ESC-derived NSCs represent a promising future for the 
development of stem cell treatments for diseases such as 
multiple sclerosis, spinal cord injury, and Parkinson’s. These 
three diseases are prime targets for stem cell replacement 
therapies because they are each caused by the loss of one 
type of mature, differentiated cell. 


REPROGRAMMED ADULT CELLS 
REGAIN POTENTIAL 


People have long wanted to turn back the clock and 
regain their youth. For specialized adult cells, this 
would require differentiated cells to be recalibrated 
to a younger, less specialized state through genome 
reprogramming. The biological process of genome 
reprogramming is thought to reset the cell’s DNA 
genome to a pluripotent state, which then permits the 
reprogrammed cell to change developmental fate and 
become a different type of specialized cell. An exam- 
ple of this type of dramatic change occurs just after 
fertilization when the adult sperm and egg genomes 
come together to create a new reprogrammed embry- 
onic genome. The reprogrammed embryonic genome 
will direct the development of all the cells in the 
embryo, including the ESCs that give rise to the entire 
human body. The reprogramming mechanism does not 
change or mutate the genome DNA sequence, instead 
the epigenetic information is carried in the DNA 
modification patterns and the proteins bound to the 
chromosome. 


(C) 


FIGURE 12.13 Human neural stem cells develop in culture. (A) Neurosphere structure grows attached to the substrate (100 magnifica- 
tion). (B) Developing neuronal outgrowth contains DNA stained with DAPI (blue), as well as the Nestin, (red), Neuron Tuj1 (green) proteins 
(200X magnification). (C) Budding neurosphere shows expression of the Sox-2 (red), and Nestin (green) proteins (400X magnification). 
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Genome reprogramming is an active area of research, 
driven in large part by the need to understand the 
molecular mechanisms that drive cell differentiation and 
the desire to develop patient-specific stem cells. Patient- 
specific stem cells are derived from and genetically 
related to a specific patient, making these cells espe- 
cially useful in treating that particular patient’s disease. 

Most cell replacement therapies suffer from the 
disadvantage that the patient is usually treated with 
cells that are foreign to the patient’s body. This situa- 
tion triggers tissue rejection, and the patient’s own 
immune system attacks the transplanted foreign cells. 
Tissue rejection requires treatment with strong immu- 
nosuppressive drugs to decrease the effectiveness of 
the patients immune system with the hope of halting 
the rejection process. Unfortunately, many complica- 
tions and side effects can accompany immunosup- 
pressive treatments, making it even more important to 
develop cell replacement therapies that do not cause 
this immune response. Patient-specific stem cells or 
donor-specific stem cells will eliminate the need for 
immunosuppressive drugs because the stem cells in 
question were originally derived from the patient and 
are therefore a genetic match to the patient. 


Genome reprogramming is an epigenetic process that can 
turn back the clock in adult skin cells, and cause fully differ- 
entiated cells to adopt a much younger less specialized state. 


Reprogrammed Cells Avoid Human 
Embryos 


Reprogrammed genomes have the power to dramati- 
cally alter the fate of specialized adult cells, as dem- 
onstrated in the following examples: 


Somatic cell nuclear transfer. The genome in the 
nucleus of a specialized adult cell is reprogrammed 
when the somatic cell nucleus is transferred into 
an empty egg cell that has already had its nucleus 
removed. Genome reprogramming permits the 
somatic cell nuclear genome to direct the 
development of an entire embryo generated from 
an unfertilized egg cell. 

Cell fusion. When a specialized adult cell (containing 
an adult nucleus and genome) and an embryonic 
stem cell (containing an embryonic nucleus and 
genome) are fused together, the contents of the 
two cells co-mingle, but the two nuclei remain 
separate. The adult nuclear genome is 
reprogrammed by key proteins in the cytoplasm 
of the embryonic stem cell. 

Induced pluripotent stem (iPS) cells. Adult human 
skin cells grown in the lab were treated with DNA 
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encoding four different transcription factor 
proteins. When expressed in the adult skin cells, 
the transcription factor proteins altered gene 
expression in the cell, and produced proteins 
that reprogram the adult genome into an 
embryonic genome. As a result the adult skin 
cells are converted into undifferentiated cells 
called induced pluripotent stem (iPS) cells that 
resemble normal embryonic stem cells. Most 
exciting, scientists have successfully triggered 
the iPS cells to differentiate into specialized 
nerve cells. 


Cloning by Somatic Cell Nuclear 
Transfer 


Somatic cell nuclear transfer (SCNT) involves replac- 
ing the nucleus of an oocyte (an unfertilized egg cell) 
with a donated nucleus from a differentiated adult 
somatic cell (Figure 12.14) (see Chapter 14). The 
cytoplasm in a mammalian oocyte contains all of the 
molecules needed to reprogram the adult genome to 
direct the “cloned” egg to develop into a blastocyst 
embryo in the lab. Depending on what happens next, 
the “cloned” blastocyst embryos (generated by SCNT) 
are used for either reproductive cloning or therapeutic 
cloning, depending on the specific goal of the work, 
and the organism involved. 

Reproductive cloning is used to generate genetic 
copies of an entire living organism. This approach 
gave us the first cloned animal, Dolly the sheep (see 
Chapter 14). Reproductive cloning first uses the proc- 
ess of somatic cell nuclear transfer (SCNT) to gener- 
ate cloned blastocyst embryos carrying new genes. 
The blastocyst embryos are then implanted into surro- 
gate females, which carry the developing embryos to 
term and give birth to the live cloned animals. Born in 
1997, Dolly the sheep was the result of somatic cell 
nuclear transfer and reproductive cloning. Dolly was 
a major scientific accomplishment and a very photo- 
genic sheep that introduced the public to the prospect 
of cloning animals almost overnight. Of course, for 
many reasons, the reproductive cloning of humans is 
banned in the United States. 

Dolly was cloned from the single nucleus of a 
mammary epithelium skin cell obtained from a 6-year- 
old adult sheep (see Figure 12.14). The nucleus was 
removed from the sheep skin cell and introduced into 
the empty cytoplasm of an unfertilized sheep egg cell 
(the egg nucleus had already been removed). Under the 
influence of proteins from the egg’s cytoplasm, the adult 
genome is reprogrammed and directs the cell to start 
embryo development. Once the embryos have reached 
the blastocyst stage of development, they are implanted 
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into surrogate females to nurture the growing embryos 
to birth. 

There is no genetic relationship between the 
implanted embryos and the surrogate females that carry 
the embryos to term. The genomes of the cloned ani- 
mals are identical to the genomes in the donor nuclei; 
genetically identical copies are called clones. It is 
important to note that cloning animals is a very inef- 
ficient process overall, and it took persistent efforts to 
produce Dolly and other cloned animals. However, 17 
species of cloned animals were generated by reproduc- 
tive cloning methods in the 10 years between 1997 and 
2007, including sheep, mouse, bull, pig, goat, Guar 
wild ox, Mouflon sheep, rabbit, cat, mule, rat, African 
wildcat, dog, water buffalo, horse, ferret, and wolf. 
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The reproductive cloning process that produced 
Dolly is quite different from the use of somatic cell 
nuclear transfer (SCNT) in the process of therapeutic 
cloning (Figure 12.15). Unlike in reproductive cloning, 
the blastocyst embryos generated by therapeutic clon- 
ing methods are never implanted in surrogate females 
and are never used to produce pregnancy. The thera- 
peutic cloned blastocysts are most often used as the 
source of donor-specific embryonic stem cells used to 
generate specific cells to treat the patient’s disease, and 
avoid the risk of transplant rejection (see Figure 12.15). 
To date, scientists have not been able to success- 
fully generate cloned human blastocyst embryos by 
SCNT. Cloned human embryonic cells have been 
observed to undergo the early stages of cell division 


L fused cell begins 


dividing normally. 


Embryo 


(B) 


FIGURE 12.14 Dolly was created by reproductive cloning. The donor nucleus was removed from the skin cell of an adult sheep and intro- 
duced into the empty cytoplasm of an empty sheep egg to make a cloned embryo. Proteins in the egg cytoplasm reprogram the donated adult 
sheep genome, and the reprogrammed genome then directs the cell to become less specialized. The resulting cloned blastocyst embryos 
were implanted into surrogate females, and Dolly was born. (B) Dolly, the first cloned animal, is shown here with her biological daughter 
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FIGURE 12.15 Therapeutic cloning by SCNT. (A) The nucleus of an egg cell oocyte is removed and replaced by the nucleus of an adult 
(somatic) cell donated by a specific patient. (B) This ‘cloned egg’ develops into a cloned blastocyst embryo. (C) The ESCs from cloned blas- 
tocyst embryos have the potential to develop into any type of adult cells and tissues, which are all are genetically matched to the specific 


human donor. 
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and development, but so far they have not achieved 
the blastocyst stage. 


Genome Reprogramming by Cell Fusion 


When two cells are fused together the two nuclei are 
suddenly housed in the same cytoplasm influence 
each other. In many cases the genomes of one or both 
nuclei undergo reprogramming. When ESCs are fused 
to adult skin cells, the genome of the adult skin cell 
is exposed to the cytoplasm of the pluripotent ESC. 
Molecular components and proteins in the cytoplasm 
of the ESC reprogram the adult cell genome, causing 
the adult skin cell to convert to a pluripotent state. 

Researchers fused human ESCs with human fibrob- 
last skin cells and generated “cybrid” cells, and fac- 
tors in the cytoplasm of the ESC reprogram the adult 
cell genome. Although each cybrid cell contains two 
nuclei, one nucleus from the ESC and one nucleus 
from the fibroblast, the cybrid cell maintains a stable 
tetraploid genome containing twice the normal diploid 
DNA content. In addition, the cybrids have the same 
cellular morphology, growth rate, and cell surface 
proteins as do human embryonic stem cells (hESCs). 
Surprisingly, cybrid cells can differentiate into the dif- 
ferent specialized cell types, much like the human 
embryonic stem cells that give rise to the three cell lay- 
ers in the developing embryo (Figure 12.16). 


FIGURE 12.16 Human embryonic stem cells in culture. Human 
ESCs grow in tightly clustered colonies of cells that are in very 
close contact with neighboring cells in the colony. The hESCs are 
commonly co-cultured with embryonic fibroblast cells, which pro- 
vide factors that support the growth of the undifferentiated hESCs. 
Arrowhead shows MEF fibroblast. Arrow shows differentiating cells 
at the edge of the hESC colony. (100 magnification). 
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INDUCED PLURIPOTENT STEM CELLS 


The studies on induced pluripotent stem (iPS) cells 
will help scientists to better understand the molecular 
mechanisms responsible for genome reprogramming. 
In the future, scientists plan to use the iPS approach 
to generate pluripotent stem cells with greater diversity 
and to generate hESCs with known genetic mutations 
or possible predisposition to develop certain diseases. 
The ability to generate iPS from genetically diverse 
populations should help us to better understand the 
causes of these diseases and to develop better drugs 
designed to treat specific genetic diseases. 

The discovery that adult somatic (body) cells can 
be reprogrammed to behave like embryonic stem cells 
is exciting, but scientists still need to better under- 
stand the similarities and differences between the iPS 
cells and normal ESCs and it is not yet clear how well 
the iPS cells will function in the laboratory and in the 
clinic. Nevertheless, the discovery of iPS cells repre- 
sents a huge leap forward for the field of regenerative 
medicine. The potential to make pluripotent stem cells 
without using human embryos brings the reality of 
patient-specific pluripotent stem cells one step closer. 


Genome reprogramming occurred when the Oct4, Sox-2, 
and Nanog transcription factor genes were introduced into 
the adult somatic skin cells. The change in gene expression 
patterns caused by these transcription factors caused the pro- 
duction of proteins that reprogrammed the genome and trig- 
gered the formation of induced pluripotent stem cells (iPS). 


EPIGENETIC CHANGES IN GENOME 
REPROGRAMMING 


The need to understand the genes involved in cell dif- 
ferentiation has created a strong motive for research- 
ers to dissect the molecular mechanisms involved in 
reprogramming the DNA genome of a cell. Genome 
sequences differ slightly between people, but in 
an individual person’s body, all of the cells contain 
exactly the same genes and have the identical genome 
DNA sequence. The different types of cells are not 
different because they carry different genes in their 
genomes, they are different because they express dif- 
ferent genes. Different gene expression patterns direct 
the synthesis of different proteins in different types of 
cells, and provide the specialized cells with tissue-spe- 
cific functions. For example, the genes expressed and 
the mRNAs made in a liver cell are very different from 
the mRNAs expressed in a muscle cell, because the 
liver cell needs proteins that are different than the pro- 
teins needed by the muscle cells. The liver and muscle 
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Box 12.1 Human Skin Cells Are Converted into Pluripotent Stem Cells 


Scientists are studying the gene expression patterns in the 
pluripotent embryonic stem cells growing in the lab to find 
out how the cells control and maintain the pluripotent (undif- 
ferentiated) and differentiated states. Since the pluripotent 
state of the ESCs is induced by the expression of specific 
genes the scientists decided to mimic this specific gene 
expression pattern in fully differentiated adult skin cells. 

First the researchers needed to identify which transcrip- 
tion factor proteins are actively involved in maintaining the 
pluripotent state of the cells. They discovered that a small 
number of transcription factor genes including Oct4, Sox2, 
Kif4, and Nanog, among others, have essential roles in main- 
taining the cell’s ability to remain pluripotent. These transcrip- 
tion factor proteins control the expression (transcription) of a 
series of other human genes required to establish and main- 
tain pluripotency (Figure 12.17). These genes are vital to the 
production and maintenance of the inner cell mass in the 
blastocyst and to maintain the pluripotent state of ESCs. 

Scientists reasoned that the transcription factor proteins 
might be able to influence the state of a differentiated adult 
cell by turning on expression of the genes in the skin cells 
that are normally expressed in ESCs, resulting in a change to 
pluripotency. 

In 2007, two independent research groups, James 
Thomson (University of Wisconsin, Madison) (Science) and 
Shinya Yamanaka (Kyoto University) (Cell), tested the idea 
that certain transcription factor proteins might influence the 
state of a differentiated adult skin cell by inducing expression 
of genes that are normally expressed only in ESCs. The sci- 
entists introduced the transcription factor gene DNA into the 
adult skin cells in the lab (Figure 12.18). To their surprise both 
teams found that the transcription factor genes had converted 
the adult skin cells into apparently normal embryonic stem 
cells, called induced pluripotent stem cells (iPS cells), to dis- 
tinguish the iPS cells from the ESCs (Figure 12.19). 

Both research teams then showed that the iPS cells could 
be induced to generate different specialized cells in culture, 


(A) 


(B) 


including nerve cell tissue. The iPS cells meet all the criteria 
of embryonic stem cells, including self-renewal and pluripo- 
tency, except that the iPS cells are not derived from human 
embryos. ESCs that naturally express the Oct4, Sox-2, and 
Nanog proteins can be grown for extended periods in the lab- 
oratory and remain pluripotent and do not acquire chromo- 
somal abnormalities that lead to cancer. It is significant that 
the iPS cells also exhibit high developmental potential and 
are capable of unlimited cell division without genome dam- 
age. Like the normal ECSs, the iPS cells respond to normal 
cell cycle regulation and avoid DNA chromosome damage. 
Potentially, the iPS cells can differentiate into any of the more 
specialized cells in the ectoderm, mesoderm, and endoderm 
germ layers. To date iPS cells have formed nerve tissue, but 
further research will tell whether iPS cells live up to this 
prediction. 

The ability to induce pluripotent stem (iPS) cells was a 
major breakthrough in the field of stem cell science, which will 
result in the routine use of patient-specific and disease-specific 


FIGURE 12.18 The method used to generate induced pluripo- 
tent stem (iPS) cells: (1) Isolate and culture host cells (fibroblasts). 
(2) Introduce the ES specific genes (iPS factors) into the cells car- 
ried in a retrovirus vector. Red cells indicate the cells expressing 
the added iPS genes. (3) Harvest and culture the cells using feeder 
cells (gray). (4) A subset of the cells generates ES-like colonies, 
the iPS cells. 
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FIGURE 12.17  Pluripotency transcription factor proteins are expressed in human ESCs. Cells in these human embryonic stem cell 
colonies (hESC) have been stained with antibodies to show the expression of three key pluripotency proteins: (A) Nanog, (B) Sox2, and 


(C) Oct4 (400 magnification). 
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Box 12.1 Continued 
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FIGURE 12.19 Human skin cells mimic ESCs and generate nerve cells. (A) Human skin cells in culture before the cells were geneti- 
cally modified and converted into embryonic stem cells. (B) This photomicrograph shows nerve tissue generated from human skin cells. 
The transcription factors cause reprogramming of the skin cell genome, inducing the skin cell to become an embryonic stem cell (iPS). 
The black scale bar represents 0.1 millimeters, (about the width of a human hair). 


pluripotent stem cells in future medical treatments, all without 
the use of human embryos. However, most scientists agree that 
research on human embryonic stem cells is so important that it 
should continue using donated embryos. At this time it is pre- 
mature to conclude that the iPS cells created in the lab are iden- 
tical to the normal pluripotent ESCs and will completely replace 
the use of embryonic stem cells in future. Specific genes turned 


cells are very different in form and functions but they 
carry exactly the same DNA genomes. 

Transcription factors and other specific proteins func- 
tion to control tissue-specific gene expression, which 
permits only certain genes to be turned on in certain 
types of cells at certain times. Eukaryotic genes are reg- 
ulated at many points along the gene expression path- 
way from DNA to RNA to protein, including processes 
such as transcription initiation, RNA processing, and 
transport, to protein translation and folding (see Chapter 
3). Still, none of these processes is sufficient to explain 
what happens during genome reprogramming, when 
an adult nucleus becomes “young” again and adopts a 
pluripotent state. Recent evidence shows that chroma- 
tin, the specific combination of DNA and histone pro- 
teins that make up the structures of the chromosomes, 
plays a key role in the genome reprogramming process. 


All of the different types of cells in a person’s body con- 
tain exactly the same genes in the same genome.. Different 
types of specialized cells are distinguished from each other 
not by carrying different genes in their genomes, but by 
turning on and expressing different genes from the same 
human genome in the different cell types. 


on and expressed after fertilization trigger the growth of the 
inner cell mass within the developing blastocyst embryo. Other 
genes are required to direct the development of the embryonic 
stem cells (ESCs) inside the blastocyst, and to maintain the 
pluripotent state. This includes the transcription factor genes that 
make iPS cells, such as Oct4, Sox-2, and Nanog. 


Reprogramming Involves Chromatin 
Modification 


Chromatin (DNA and histone proteins) plays a cen- 
tral role in packaging a very long double-stranded 
DNA helix into a typical chromosome in a eukaryo- 
tic nucleus. The highly specialized histone proteins 
assemble into protein-DNA complexes called nucleo- 
somes that are spaced at intervals of every 220-250 
base pairs along the DNA helix, like beads on a string. 
Each nucleosome core contains an octamer of 4 differ- 
ent histone proteins, and about 200 base pairs of DNA 
helix wrapped twice around the outside of the octamer 
(Figure 12.20). 

The DNA helix and the core histone proteins are 
the majority of components that make up the struc- 
ture of human chromosomes. Histone proteins have 
highly globular middle regions flanked by the ends of 
the proteins (see Figure 12.20). The N-terminal (amino- 
terminal) ends of the histone proteins extend outside 
the nucleosome core where they interact with regu- 
latory proteins in the nucleus. As a result the histone 
proteins play an important role in gene expression. 
Enzyme proteins in the nucleus modify the histones by 
adding specific chemical groups to certain amino acids 
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FIGURE 12.20 DNA and histone proteins form a nucleosome. (A) The nucleosome contains a core of eight histone proteins with the his- 
tone N-terminal ends projecting away from the core. Epigenetic modification of the protein tails allows the histones to transmit signals and 
influence gene expression. (B) The first 30 amino acids in the N-terminus of the human histone H3 protein are shown. Several amino acids 
in the N-terminus of histone H3 proteins are targets for epigenetic changes (acetylation, phosphorylation, and methylation). The collection of 
epigenetic modifications, called the histone code, causes a finely tuned transcriptional response in the cells, which alters the expression of a 


subset of genes in the cell. 


in the N-terminal ends of the proteins. This covalent 
modification of the histone proteins in chromosomes is 
a good example of an epigenetic modification because 
the genetic signals transmitted by the histone protein 
“code” are not encoded in the primary sequence of 
the DNA genome. Epigenetic modification potentially 
impacts the entire genome, changes gene expression, 
and profoundly alters the state of the cell. Epigenetic 
memory is a process that allows cells to maintain their 
undifferentiated state, even while the cells are exposed 
to the conditions that usually induce differentiation. 
Epigenetic memory is an important function that helps 
to maintain stable cultures of pluripotent stem cells 
over time and helps prevent the formation of cancer 
cells derived from the pluripotent cells. 


Research shows that modification of the histone proteins in 
chromosomes turns on the expression of genes that direct 
the processes of cellular differentiation and genome repro- 
gramming. Histone modification is an epigenetic change 
that occurs without changing the DNA sequence of the 
reprogrammed genome. 


Genome reprogramming occurs after an egg cell 
(oocyte) containing 23 chromosomes is fertilized 
by a sperm, which also carries 23 chromosomes. In 
sperm the genome DNA is packaged into compact 
structures using protamine instead of the usual his- 
tone proteins (Figure 12.21). After the egg is fertilized 
by a sperm the protamine is removed and the sperm 
genome DNA is packaged into 23 traditional chromo- 
somes by acetylated histone proteins (modified with 
acetyl-groups). The acetyl groups on the histone pro- 
teins help to maintain the chromatin structure in an 
“open” conformation, which offers access to the DNA 
genome and helps to prepare the DNA for transcrip- 
tion. Genome reprogramming continues when methyl 
groups are transferred from the genome DNA and are 
added to the histone proteins. In addition, the structure 
of the egg chromosomes change as the oocyte-specific 
linker histone H100 protein is removed from the chro- 
matin (see Figure 12.21). 

In early embryo development, the embryo’s 
genome, called the epigenome, is in a highly dynamic 
state. During blastocyst formation, the epigenome 
undergoes changes that set the stage for further embryo 
development. As the cells transition from pluripotent to 
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FIGURE 12.21 Epigenetic changes in chromatin structure are important for genome reprogramming. The chromosome DNA in sperm is 


highly compacted with protamines (left), but after fertilization, the protamine is removed and the DNA is packaged into normal chromosome 
structures containing acetylated histones that maintain an “open” chromatin conformation that is ready for transcription (right). The demeth- 
ylation of DNA and the methylation of the histone proteins is part of the genome reprogramming process that prepares the chromosome DNA 


for transcription. 


more differentiated and restricted states, large changes 
in gene expression take place. 


The Future of Stem Cell Research and 
Therapeutics 


Understanding the genes and underlying molecu- 
lar mechanisms that maintain a cell in a pluripotent 
state or trigger cell differentiation is absolutely criti- 
cal if stem cell therapies are to become available as 
routine treatments for disease. Scientists need to have 
a detailed understanding of the events in the cell that 
influence the function of the genome, but the small 
numbers of genes that can be monitored using conven- 
tional techniques has limited the search for answers 
to questions about the human genome. However, the 
recent development of genome-wide screening meth- 
ods and systems-level evaluations now permit scien- 
tists to assess gene expression across entire populations 
of cells. 

The ESC system is particularly well suited for the 
application of genomic and systems-level genetic 
screens. The earliest stages of embryo development 
and commitment to differentiation can be evaluated 
using a combination of genomic methods includ- 
ing genetic and pharmacological manipulation of 
the cells. Stem cells are temporary in the developing 
embryo and exist in very small numbers, but the abil- 
ity to grow ESCs in culture now provides sources of 


progenitor cells in quantities that are sufficient for use 
in clinical therapies. 

The systems biology approach to studying stem cells 
provides a dynamic overview of the molecular pathways 
at work in a given cell as the cell responds to differ- 
ent environmental signals. This work could lead to the 
development of more methods or agents that will relia- 
bly induce pluripotent cells in the lab to successfully dif- 
ferentiate into specific differentiated cell lineages. 

The huge potential of human ESCs to generate spe- 
cialized cells for cell replacement therapies has led 
to widespread proposals for the use of stem cells as 
therapeutic agents, including ways to overcome the 
limitations imposed by adult stem cells in therapeutic 
treatments. The very nature of an embryonic stem cell 
is to generate all of the different types of cells in the 
body, but this presents real challenges for scientists as 
well, because they must determine how to trigger the 
ESCs to differentiate into the type of cell needed for 
a particular cell replacement therapy. We know little 
about the factors required to induce ESCs to differenti- 
ate into nerve cells, which are probably different from 
those required to induce muscle cell development. 

In addition to learning how to induce ESCs to dif- 
ferentiate into specialized cells, scientists also needed 
methods to detect the presence of undifferentiated 
cells that are mixed in with the differentiated cells. 
The potential genome instability of the undifferenti- 
ated stem cells carries the risk of developing cancer 
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cells in the transplant recipient under some condi- 
tions. The progress in stem cell research has contin- 
ued despite the controversy, technical difficulties, and 
funding problems precisely because of the substantial 
long-term benefits of stem cell work. Human studies 
and research on model animal systems will continue 
to be at the forefront of the efforts to develop effective 
regenerative medical treatments based on advances in 
stem cell science. 


Cells must accurately control the expression of certain 
genes that are turned on in early progenitor cells and are 
gradually turned off and then silenced at later developmen- 
tal stages. Meanwhile, subsets of cell type-specific genes 
are turned on to make proteins that are required for the next 
stage of embryo development. 


The ability to insert genes into specific sites in the 
genomes of human ESCs by DNA transformation is a 
major future goal for scientists in the field of regenera- 
tive medicine. The plan is to introduce DNA into the 
cells by transformation and then to grow the trans- 
formed cells containing the inserted DNA. When the 
cells are transplanted into the patient, the cells will 
all carry the repaired therapeutic DNA in their chro- 
mosomes. Using this approach, scientists will be able 
to repair or replace a defective or mutant gene in the 
genome of the stem cell. This gene repair will be inher- 
ited by the progeny cells and will also effectively repair 
the defect in all the future progeny cells. Achieving 
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this goal is still years away and there are many techni- 
cal obstacles to be overcome and ethical issues to be 
discussed (see Chapter 11). Successful gene therapy 
relies on developing highly effective gene targeting 
techniques that will accurately insert DNA into tar- 
get sites in the genome DNA of human ESCs. When 
appropriate gene targeting technologies for hESCs 
have been developed, the hESCs will become an even 
more important resource for the study of human devel- 
opment and disease and will also greatly promote the 
application of genetically engineered cellular thera- 
peutics to the physician’s arsenal for the treatment of 
disease or injury. 

In the 17 years between the first report of the exist- 
ence of mouse embryonic stem cells and the subse- 
quent identification of human embryonic stem cells, 
scientists have constructed knockout mutations in 
the genomes of many different organisms. This is 
in addition to the hundreds of knockout strains of 
mice constructed with each strain having a different 
gene deleted or removed (knockout) from the mouse 
genome. Work with mouse ESCs has greatly aided the 
study of human ESCs and continues to provide impor- 
tant information to complement and enhance research 
on human ESCs. 


Scientists constructed a collection of mouse knockout 
strains that represent mutations in about half of the genes 
predicted in the mouse genome. Knockout mice with spe- 
cial mutations are custom-designed and available by spe- 
cial order, providing scientists with many research options. 


Box 12.2 Knockout Mice Are Right on Target 


In 2007, the Nobel Prize in medicine was awarded to Mario R. 
Capecchi, Martin J. Evans, and Oliver Smithies for their work on 
knockout mice. They developed a landmark method that permits 
scientists to specifically inactivate a single mouse gene at will. 
This method permits scientists to target a single gene in the mouse 
genome for inactivation through the use of mouse embryonic 
stem cells. The different strains of knockout mice were analyzed 
to determine how the gene knockout and the loss of the normal 
mouse protein affected the normal traits exhibited by the mice. 
Capecchi and Smithies knew how to genetically modify 
specific genes in the mammalian genome but neither scientist 
had ever worked with pluripotent stem cells, so they began to 
collaborate with Evans, who had extensive stem cell experience. 
This team successfully produced live animals carrying geneti- 
cally modified genomes accurately inherited from the knockout 
parent mouse. Scientists constructed a collection of pluripotent 
mouse embryonic stem cell lines with each cell line carrying a 
different knockout gene mutation. This collection of knockout 


mice has provided the scientific research community with 
a very valuable research tool used to study many different 
human diseases. The knockout mutations carried by these 
mouse strains are well characterized and the highly similar 
genetic backgrounds cause the knockout mice to exhibit traits 
(phenotypes) that are predominantly caused by the knockout 
mutation and the lack of an important protein product and are 
not the non-specific result of a spurious mutation elsewhere 
in the mouse genome. Research involving knockout mice 
and other organisms carrying knockout mutations or other 
forms of transgenic genomes continues to have a significant 
impact on scientific research. Gene targeting is a powerful 
tool to study many biological processes including mamma- 
lian cell, tissue, organ and body development, physiology, 
aging, and human disease. In addition to the Nobel Prize, in 
2001 Capecchi, Smithies, and Evans were presented with the 
Lasker Award for Basic Scientific Research for their innovative 
research on knockout mice. 
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SUMMARY 


The field of regenerative medicine has made real 
progress since scientists discovered the amazing 
developmental potential of embryonic stem cells. 
Regenerative medicine is a highly dynamic area of 
research and clinical medicine in which new discover- 
ies continue to challenge long-held beliefs about the 
success of cell regeneration. Stem cells are often in the 
national news and seem to hold a certain fascination 
for laypersons and scientists alike. The discovery of 
human ESCs had an impact on the general public that 
was equaled by few other scientific discoveries. The 
1998 report on human ESCs generated wide-ranging 
excitement because of the probability that these cells 
will fundamentally change the way we study cell and 
tissue development, how we produce and test new 
drugs, and how we treat numerous degenerative dis- 
eases. Amid all this excitement and potential, however, 
is the big drawback that embryonic stem cells are typi- 
cally isolated from human embryos, making them the 
subject of a great deal of controversy. 

Although ESCs are a highly controversial area of 
research, the potential that stem cell therapy would 
dramatically improve treatments for degenerative dis- 
eases continues to put emphasis on stem cell research. 
Pluripotent stem cells can undergo cell division con- 
tinuously while undifferentiated and have the poten- 
tial to differentiate into all cell types in the body. 
Multipotent stem cells include adult stem cells such 
as the hematopoietic stem cells, which were derived 
from the bone marrow, and have been used to treat 
diseases and illnesses since the late 1950s. Few ethical 
concerns have been raised about the use of adult or 
somatic stem cells in research or transplant therapies, 
as they have long been used to treat diseases. The main 
controversy in stem cell research lies in the research 
using pluripotent embryonic stem cells derived from 
human embryos. Scientists, laypersons, and politicians 
have voiced views on the scientific, social, and ethical 
impact of this science. 

Fortunately, recent research has revealed new ways 
to generate pluripotent human stem cells including the 
exciting discovery that adult skin cells can be repro- 
grammed to behave like embryonic stem cells and can 
even differentiate into nerve tissue. Of course, much 
work remains to find out how well these induced 
pluripotent (iPS) cells will work under clinical treat- 
ment conditions. Nevertheless, the ability to make 
pluripotent stem cells without using human embryos 
represents a huge leap forward for the field of regen- 
erative medicine. 

Scientists do not yet know which type of stem cells 
will be most effective in a specific area of research or 
cell therapy. Research on different types of stem cells 
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has led to a better understanding of stem cell biology 
and the power of pluripotent developmental potential. 
Biotechnology researchers and pharmaceutical compa- 
nies are developing stem cell lines for applications in 
toxicology assays and in wide variety of disease treat- 
ments. The ability to make induced pluripotent (iPS) cells 
has brought us closer to the reality of patient-specific 
stem cells suitable for cell transplant therapies that are 
predicted to alleviate many degenerative diseases and 
address the problem of the limited number of human 
organs donated for human transplant therapy. 


REVIEW 


This chapter describes both the great potential offered 
by stem cells as well as the drawbacks associated with 
the development of embryonic stem cell technology. 
This chapter explains the science behind human embry- 
onic and adult stem cells, and describes the production 
of stem-cell-like induced pluripotent cells (iPS) derived 
from adult human skin cells. To assess your comprehen- 
sion of stem cell biology and technology in these topic 
areas, answer the following review questions: 


1. Explain the role played by the embryonic stem 
cells in the development of the late-stage embryo. 

2. Explain the fundamental differences between the 
developmental potential of embryonic stem cells 
and that of adult stem cells. 

3. What is the difference between a multipotent 
stem cell and a fully differentiated somatic cell? 

4. Describe the different characteristics between 
multipotent stem cells and pluripotent stem cells 
and explain the biological relationship between 
these two types of cells. 

5. Describe the advantages and disadvantages of 
medical cell replacement treatments that use cells 
from patient-specific blastocyst embryos. 

6. Explain the importance of the genome repro- 
gramming that occurs after fertilization or when 
a somatic cell nucleus is transferred into an 
“empty” egg (oocyte without a nucleus). 

7. Explain the fundamental difference between a 
DNA mutation in the genome and epigenetic 
changes affecting the genome. 

8. Describe the objections raised by the public against 
the use of embryonic stem cells for the develop- 
ment of stem cell therapies for medical treatments. 

9. To make induced pluripotent stem (iPS) cells in 
the lab, scientists treated adult skin cells with 
genes that encode what types of proteins? 

10. Describe the general structure of the human 
blastocyst embryo and explain what part of the 
blastocyst structure plays a key role in generating 
embryonic stem cells. 
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A Cancer Genome: Scientists Decode Set of 


Cancer Genes 


New York Times, November 5, 2008 

www.nytimes.com/2008/1 1/06/health/research/06 
cancer.html?pagewanted=2 &fta=y 

By Denise Grady 

For the first time, researchers have decoded all 
the genes of a person with cancer and found a set of 
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mutations that may have caused the disease or aided its 
progression. 

Using cells donated by a woman in her 50s who died 
of leukemia, the scientists sequenced the DNA from her 
cancer cells and compared it to the DNA from her own 
normal, healthy skin cells. Then, they zeroed in on 10 
mutations, DNA differences that occurred only in the can- 
cer cells, apparently spurring abnormal growth, preventing 
the cells from suppressing that growth and enabling them 
to fight off chemotherapy. 

Mutations are genetic mistakes, and the ones found 
in this research were not inborn, but developed later 
in life, like most mutations that cause cancer. (Only 5 to 
10 percent of all cancers are thought to be hereditary.) 

The new research, by looking at the entire genome— 
all the DNA—and aiming to find all the mutations involved 
in a particular cancer, differs markedly from earlier 
studies, which have searched fewer genes for individual 
mutations. 


LOOKING AHEAD 


This chapter describes the relatively new and rapidly 
growing field of pharmaceutical biotechnology. This 
research focuses on using advances in biotechnology 
to improve many aspects of drug development and 
action. A better understanding of the biological tar- 
gets of drugs in the body will improve the chances of 
designing new, medically effective drugs. Designing 
drugs for treating humans is an extremely important 
aspect of pharmaceutical research. Over the past dec- 
ade, the science of drug design grew in response to 
new information from the human genome sequence. 
New studies revealing the three-dimensional structures 
of the proteins that cause different diseases were a key 
step in drug design. Pharmaceutical biotechnology 
includes the amazing progress made in some of the 
most intriguing new scientific fields including carbon 
nanotechnology, lab-on-a-chip technology, and DNA 
computers. 
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After completing this chapter, you should be able to 
do the following: 


e Discuss the process of drug design in the context of 
personalized medicine. 

e Distinguish between the fields of genomics and 
proteomics and describe an example of research 
performed in each field. 

e Understand the challenges posed by the process of 
drug delivery in the human body. 

e Better understand the basic approaches used to 
design new drugs. 

e Describe the unusual properties of carbon-based 
nanostructures built using nanotechnogy. 

e Explain the technology behind lab-on-a-chip appli- 
cations and suggest possible future applications of 
this technology. 

e Explain why nanoparticles might have advantages 
as vehicles to deliver drugs to tumor cells. 

e Describe the development and possible applica- 
tions of DNA computers. 

e Understand the basic risks and security issues asso- 
ciated with genome privacy. 


INTRODUCTION 


The field of pharmaceutical biotechnology represents the 
marriage of many scientific specialties including genom- 
ics, proteomics, personalized medicine, drug discov- 
ery and development, lab-on-a-chip microtechnology, 
nanomedicine, and more. Scientists doing research in 
pharmaceutical biotechnology are focused on increas- 
ing the effectiveness of the drug treatments while mini- 
mizing potentially serious side effects. Pharmaceutical 
biotechnology research also focuses on developing new 
medicines and therapies to treat diseases in humans. 
Access to the complete DNA sequence of the human 
genome, including the continual sequence corrections 
and updates, provides a powerful resource that offers 
scientists a much more detailed picture of how human 
cells work and what goes wrong when disease strikes. 

Scientists must have a detailed understanding of the 
intricate processes inside the cell in order to design 
drugs that can function along with normal bodily activ- 
ities and also kills pathogens, often in the same cells. 
This information is essential for research in pharmaceu- 
tical biotechnology. Inside the human body, pharma- 
ceutical drugs must maintain a fine balance between 
offering an effective treatment for a disease while at 
the same time avoiding potentially serious side effects. 
The most effective medications are drugs that do not 
interact with most cellular components while success- 
fully attacking the intended target molecules that cause 
the disease. 
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FIGURE 13.1 Microarrays are used to detect genes that are 
expressed in some cells and not other cells. Scientists use microarray 
technology to study the expression of entire genomes to determine 
how gene expression is altered under different conditions. The two 
plates shown contain the same types of cells; the cells on the left 
were treated with the new drug, whereas the cells on the right were 
not treated (control cells). The mRNAs made in both cell samples 
were isolated separately, copied into CDNA probes, and analyzed by 
microarray technology. 


Genomics Is an Important Tool in 
Pharmaceutical Biotechnology 


When the Human Genome Project revealed the entire 
DNA sequence of the human genome, for the first time 
scientists had access to all of the human genes, which 
had a positive impact on the growth of pharmaceutical 
biotechnology. Genomics research encompasses much 
more than just the study of the human genome, it also 
includes analysis of non-human genomes, coding 
and noncoding sequences, from both prokaryotic and 
eukaryotic organisms. Researchers study the structure 
and function of the encoded genes to explore how the 
gene products interact with each other and with factors 
in the environment. In pharmacogenomics research, 
scientists often use microarray technology to analyze 
changes in the expression of hundreds or thousands of 
genes in cells either treated with a drug or induced to 
undergo differentiation (Figure 13.1 and Figure 13.2). 
Scientists use microarray technology to find out 
which genes are expressed in healthy cells compared 
to diseased cells or to determine how an administered 
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FIGURE 13.2 Microarray DNA base pairs with cDNA probes. The 
genome DNA fragments are immobilized onto a microarray grid. The 
mRNA molecules isolated from the cells are copied into labeled, 
complementary single-stranded DNA s (cDNAs) and used as hybridi- 
zation probes to search the microarray grid for complementary DNA 
sequences. (Inset) A closeup of the signals emitted by probes hybrid- 
ized to DNA on the microarray. When a gene is expressed as mRNAs 
in a cell, the DNA spot representing that gene on the microarray will 
base pair to the cDNA probe and give off a detectable signal. 


drug might affect the expression of specific genes in 
healthy and diseased cells. For this type of experi- 
ment, a scientist starts with two identical samples of 
the same cells but treats just one of the cell samples 
with a new drug (experimental sample). The remain- 
ing cell sample is not treated and is used as the con- 
trol cell sample. The genes expressed as mRNAs in the 
two different cell samples are isolated and analyzed 
separately. The mRNA molecules are very fragile and 
cannot be easily manipulated or studied. It is prefer- 
able to convert the mRNA molecules into DNA strands 
by copying the mRNAs into cDNA (complementary 
DNA) molecules using reverse transcriptase enzyme. 
The single-stranded cDNA molecules are synthesized 
with incorporated labels that permit the DNA strands 
to be used as hybridization probes to detect and base 
pair to the complementary DNA sequences displayed 
on the microarray grid. The microarray grid contains a 
large array of DNA spots distributed in a uniform pat- 
tern that permits each spot to be located. The DNA on 
the microarray represents genome DNA isolated from 
the control cells, and contained DNA from all of the 
genes in the cell genome. The results of this analysis 
are interpreted from the spots of DNA on the microar- 
ray that give a positive signal indicating the presence 
of complementary DNA. When a particular gene is 
expressed as RNA in the cell, then the specific DNA 
spot(s) representing that gene on the microarray grid 
will indicate that fact by emitting a visual signal when 
the chromosome DNA base pairs with the cDNA 
probes copied from that gene’s mRNAs. Microarray 
analyses allow scientists to monitor changes in the 
expression of hundreds or thousands of genes in many 
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different types of cells under a wide variety of different 
growth conditions and drug treatments. 

Genomics involves studying the cell from the stand- 
point of the genome and the nucleus, with a focus on 
the genes, whereas proteomics is the study of the huge 
collection of proteins made in a typical cell. The term 
“proteome” was first used in 1994 to describe all the 
proteins that are expressed in a given cell at a given 
time. The genes in the human genome provide the basic 
instructions for the proteins that make the molecular- 
machines in the cells. The proteins are the cellular com- 
ponents that perform so many different jobs in the body. 
For example, proteins make muscles move, they digest 
food into the nutrients absorbed by the cells, and they 
help nerve cells to send signals in the brain. The field of 
proteomics includes the study of interactions between 
and among proteins, protein modifications, and other 
changes that affect protein functions in the cell. 


Proteomics researchers analyze the structures and func- 
tions of the thousands of different proteins expressed in the 
trillions of cells in the human body. Genomics researchers 
focus on the DNA genome and the inherited genes that 
direct the synthesis of all the proteins in the cell. 


Protein Profiles Change in Diseased Cells 


The field of proteomics includes the various modi- 
fied and processed proteins produced in different 
cell types. Cells have three main types of posttrans- 
lational modifications: phosphorylation (phosphates 
added to proteins), methylation (methyl groups added 
to proteins), and glycosylation (sugar units added to 
proteins). Proteomics researchers also compare the 
different proteins made in diseased cells and healthy 
cells, searching for clues about how the disease proc- 
ess might affect the proteins made in the cell. 

Studies on the spectrum of proteins made in a 
specific cell type, a protein profile, have successfully 
identified biomarkers, specific proteins that invari- 
ably change in some way with the onset of a disease. 
Biomarker proteins can play an important role as a 
characteristic indicator of specific diseases, which 
can be easily detected in the patient’s blood or urine. 
“Early” biomarker proteins are produced in cells during 
the early stages of a disease, before full-blown dis- 
ease symptoms occur. A commonly known example is 
prostate-specific antigen (PSA), a protein that is made 
by the prostate gland and measured in the blood by 
a PSA test. In many cases increased PSA levels result 
from the uncontrolled growth of prostate cells, which 
is a possible indication of early prostate cancer. This 
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allows doctors to use PSA as an early biomarker indi- 
cator for this cancer. Unfortunately, the PSA test does 
not provide enough information to distinguish between 
real prostate cancer and benign conditions of the pros- 
tate that also raise PSA levels. But an increased PSA 
level should not be ignored and usually prompts both 
the doctor and patient to follow up on the possibility 
of prostate cancer. 


The huge numbers of proteins visualized in the protein pro- 
files of normal and diseased cells are compared by compu- 
ter to identify the key proteins produced only in the diseased 
cells. These proteins might be essential to the disease proc- 
ess and therefore are potential targets for drug development. 


PERSONALIZED MEDICINE: DREAM 
OR REALITY? 


Human genome research continues to have a large 
impact on our understanding of human genetic diseases 
and the large role that genes play in conferring biologi- 
cal traits. Many scientists predict that in the future the 
use of genetic information taken from an individual’s 
genome, called personalized medicine, will greatly 
improve the diagnosis and treatment of human dis- 
eases. If personalized medicine becomes standard, it 
will be routine for the nurse in the doctor's office to put 
a drop of fingertip blood from a patient into a handheld 
lab device for immediate analysis. Once the tests are 
complete, a personal genetic profile will appear on the 
handheld computer screen. If medication is needed, 
the doctor will include an analysis of your genome 
information when deciding the appropriate drug to pre- 
scribe. The long-term success of personalized medicine 
will depend in part on the public’s willingness to have 
their genes tested, the willingness of doctors to change 
how they prescribe drugs, and the level of genome 
security that will be available to patients. 


Unusual Drug Reactions Linked to 
Genetic Variation 


The normal variations found in the DNA sequences 
of different human genomes such as RFLPs and SNPs 
are linked to the inheritance of some specific human 
diseases and disorders (see Chapters 6 and 10). These 
genetic variations in the human genome can influ- 
ence the way that individuals respond to treatment 
with different drugs. For example, certain painkiller 
medications are effective in the body only after specific 
activator proteins convert the drug from an inactive to 
an active form. The idea that genes are involved in the 
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FIGURE 13.3 Cheek swab is used to collect cells and DNA. The 


cell samples for DNA testing are collected using a safe, inexpensive, 
and painless method such as gently scraping cells from the inside of 
the cheek using a swab. The cells on the swab are then processed for 
DNA testing. 


different responses that some people have to the same 
drug is not new. About 40 years ago, scientists found 
that drugs were removed (cleared) from the body at dif- 
ferent rates in identical and fraternal twins, showing that 
genes do contribute to variable drug responses. These 
experiments contributed to the formation of the modern 
fields of pharmacogenetics and pharmacogenomics. 

Scientists are studying the genomes of people who 
react differently to common medicines by search- 
ing for specific variations in genome DNA sequences 
that correlate with an individual’s reactions to certain 
drugs. Samples for DNA testing are easily collected 
from blood or more routinely by scraping the cheek 
cells from inside the mouth (Figure 13.3). These studies 
showed that certain proteins responded to drugs dif- 
ferently in different individuals as a direct result of the 
genetic variation between human genomes. Studies 
on the genetic makeup of individual people provides 
important information for scientists and health care 
professionals, but also pose potential privacy risks for 
patients, even if the personal genetic information is 
stored in a secure computer database to protect the 
privacy of the patients. 

Proponents say that personalized medicine will 
help patients to avoid the unnecessary and potentially 
dangerous side effects that can result from the routine 
practice of one-size-fits-all medicine. The research inno- 
vations offered by personalized medical care will also 
enhance our understanding of the human genes that 
contribute to cancer and many other human diseases. 
The widespread use of personalized medicine will 
depend in part on continued rapid advances in the sci- 
ence of genetic testing. Researchers can now perform 
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relatively inexpensive, and very accurate genotype 
analyses using high-throughput genetic screening tests 
that are already available to the public. Despite such 
advances, few physicians actually practice personalized 
medicine and few customized drugs have been devel- 
oped for use in humans. 


A patient's personal genetic information will help doc- 
tors to prescribe the right medicine for that individual and 
avoid side effects from the medications. This would repre- 
sent a big step toward making medicines much more effec- 
tive and less expensive due to reduced risk. 


Personalized Cancer Treatments 


In the mid-2000s, the United States government 
invested $100 million in the Cancer Genome Atlas 
project and gave much smaller amounts to other 
research programs designed to study the genomes of 
patients with different types of cancer. The goal was 
to identify the collections of genome DNA differ- 
ences that occur in cancer cell genomes but are not 
present in the genomes of healthy cells from the same 
individual. Different types of cancers develop in differ- 
ent tissues, but the process always starts with a single 
cell that acquires a new DNA mutation in its genome 
DNA. This new DNA mutation is not usually sufficient 
to cause the cell to become transformed into a cancer 
cell, but over time the cell genome acquires additional 
DNA mutations. The development of the cancer-like 
characteristics in the cell does not occur until the cell 
genome has acquired a certain set of DNA mutations 
that act together to convert the normal cell into a can- 
cer cell. This set of mutations is often called the cancer 
genome. 

Scientists working on the Cancer Genome Project 
(CGP) (Washington University) focused on identify- 
ing specific DNA genome mutations that are involved 
in developing acute myelogenous leukemia (AML), 
an aggressive cancer that affects about 13,000 adult 
Americans each year and kills 8,800. The scientists 
studied the genome of an AML patient who volun- 
teered for the Washington University study, and they 
found eight mutations linked specifically to cancer 
development. Unfortunately, AML took this patient's 
life just two years after the cancer diagnosis despite 
chemotherapy and two bone marrow transplant treat- 
ments. In the future researchers hope to avoid this out- 
come by using information from the cancer genome 
study. Sometimes if doctors know early enough that 
a specific patient's cancer is likely to be unusually 
aggressive, they can decide to prescribe more powerful 
treatments much earlier in the course of the disease. 
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Further studies on the AML-specific genome vari- 
ations will help to determine which cellular function 
is affected by each genome mutation. For example, 
changes in the genome DNA might increase the func- 
tions of genes that promote the growth of cancer cells 
(oncogenes) or destroy the genes that code for the 
tumor suppressor proteins that usually protect healthy 
cells from cancer (see Chapter 9). Other genome muta- 
tions give the tumor cells the advantage by efficiently 
removing the chemotherapeutic drugs from the tis- 
sue before the cancer cells are damaged or destroyed. 
Scientists continue to search for the specific locations 
in the human genome of the specific DNA mutations 
that finally tip the cell’s biochemical balance toward 
becoming a cancer cell. It is important to understand 
the signal or set of signals or biochemical events that 
finally change and cause the cell to abandon normal 
growth controls and become cancerous. 


Personal Genome Science: Google 
Your DNA 


Over the years, the cost of DNA sequence analysis 
decreased substantially because of improved DNA 
sequencing technology and the widespread use of 
automatic DNA sequencing machines (see Chapter 6). 
Ongoing innovations in DNA sequencing technology 
will continue to provide faster and even more accu- 
rate DNA sequencing methods, and further important 
information will come from comparing the genome 
DNA sequences from many different individuals. The 
members of the Personal Genome Project (PGP) vol- 
unteer to share their personal genome DNA sequences 
and personal genetic information with other people. 
Despite the potential security issues, they even post 
their DNA sequences on the Internet to share with the 
entire world. An important goal of the PGP is to make 
personal genome sequencing more affordable and 
accessible and to help people to better understand how 
genes and the environment influence human traits. 

Scientists no longer need to actually determine the 
complete DNA sequence of an entire genome in order 
to glean a large amount of genetic information about 
an individual. Sometimes instead of sequencing an 
entire genome, scientists analyze only those regions of 
the genome that vary in DNA sequence between indi- 
viduals. This approach forms the scientific basis for the 
DNA fingerprinting technology used in forensic DNA 
testing (see Chapter 8). Variable regions of the genome 
often include single base pair differences (SNPs) in 
individual human genomes. Many of these SNPs and 
other specific variations in DNA sequences have been 
genetically linked to an increased risk for developing a 
specific human disease. 
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The science of personal genomics is in its infancy 
but has already had a large impact on the fields of 
medicine, health insurance, and economics. Personal 
genomics has immediate applications in health care 
because of the wealth of information in an individu- 
al’s genome that can be accessed and interpreted by 
genetic testing. In the past, the ability to explore the 
human genome has been the exclusive domain of 
scientists working in sophisticated research labs. This 
situation changed dramatically in 2007 when biotech- 
nology companies began to offer commercial genome 
testing for the first time. As part of their genetic testing 
services, the companies send clients their genotype 
results as well as an interpretation of the test results, 
including the potential lifetime risk of developing cer- 
tain genetic diseases (Figure 13.4). 

When a reporter from the New York Times became 
one of the first people to have her DNA genome tested, 
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23andMe analyzed 550,000 SNPs in her DNA and 
identified the sequence differences or genotypes at those 
locations in her genome (Figure 13.5). For example, the 
reporter's genotype of the alleles for adult lactose intol- 
erance is “GG,” in agreement with the reporter’s experi- 
ence that she is indeed unable to tolerate dairy foods 
containing lactose. A different genotype DNA sequence 
at that SNP location in the genome confers the pre- 
dominant trait of easily digesting lactose. The three dif- 
ferent biotechnology companies (23andMe, deCODE 
Genetics, and Navigenics) used similar methods to ana- 
lyze the genotypes of different SNPs in each genome. 
At that time all of the genome testing companies volun- 
tarily offer some type of educational assistance to help 
people to better understand the information in their own 
genomes. However, this assistance varies from profes- 
sional genetic counselors available for consultation, to 
offers of referrals to genetic counselors. The cost of this 
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FIGURE 13.4 Tiny variations in the human genome can have a large impact on the person. (A) The DNA sequence of the genome of a New 
York Times (NYT) reporter was tested by the 23andMe company in 2007. The company determined the specific order of the DNA base pairs 
at many regions of the DNA helix in the reporter’s genome. (B) The order (sequence) of the DNA bases are almost identical in different human 
genomes, but the sequences can vary and there are millions of genome locations where the human genome sequences differ by single base 
pairs called single nucleotide polymorphisms (SNPs) (see Chapter 8). In a specific SNP location in a DNA genome, person 1 inherited the 
same base pair from each parent, whereas person 2 inherited different base pairs at that site. In other words, at this specific SNP location, 
person 1 has a genotype of “GG,” and person 2 has the genotype “AG.” (red circles) (C) To understand the biological consequences of 
inheriting different SNP genotypes, scientists studied the genotypes of people with various traits and established correlations between certain 
SNP genotypes and inheriting certain traits. In the example offered by the reporter's genome, an SNP variant located near a gene (LCT) that 
encodes a lactase enzyme is correlated with whether or not the lactase gene is expressed during adulthood. People with the genotype “GG” 
at this SNP, like the reporter, are more likely to experience lactose intolerance than people who inherited the “AG” or “AA” genotypes at that 
SNP genome location. 
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genome analysis service in 2007 ranged from the most 
expensive at $2500 for Navigenics (but this price also 
included genetic counseling) to about $1000 dollars for 
23andMe ($999) and deCODE Genetics ($985). 

Reporting genetic results without information or 
support from a genetic counselor or other healthcare 
professional is not ideal and raises the possibility that 
people will easily misinterpret the genetic predictions. 
For example, in this case it should be made clear to 
customers that inheriting any combination of ACTN3 
alleles does not guarantee that a child will develop the 
physical attributes and talents of a natural athlete and 
compete in the Olympics. People should be reminded 
that human traits, including personality and behavior, 
result not just from the expression of inherited genes 
but are also influenced by the environment, as well as 
lifetime experiences and diet. 


SNP Location Genotype 
rs662799 APOAS AA 
rs174575 FADS2 CC 
rs6920220 6q23 GG 
rs17070145 KIBRA cc 
rs1801260 CLOCK AA 
rs1953558 OR11H7P CC 
rs17822931 ABCC11 CC 
rs4613903 TAS2R38 CG 
rs3751812 FTO GG 
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The availability of these personal genomics test- 
ing services has raised concerns about standardizing 
access to genetic counselors and increasing educa- 
tional support services to help people better understand 
their own personal genome information. The implica- 
tions suggest that personal genomics will affect a wide 
audience including doctors, medical educators, medi- 
cal economists, insurers, and policy makers as well as 
the general public. Access to educational resources 
for individuals who have their genomes analyzed will 
be very important to help people to recognize misin- 
formation and to be able to sort among the possible 
solutions to a problem according to the quality of the 
supporting evidence. 

The new industry of personal genomics has already 
begun to create a new industry to respond to the 
demand for commercial products and services that are 
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FIGURE 13.5 Different SNP genotypes are linked to different human traits. When the DNA genome of the NYT reporter was tested by the 
23andMe company in 2007 there were 550,000 SNPs identified in the human genome. The company screened many human genomes and 
determined the specific DNA sequences or genotypes at specific SNP genome locations. Specific SNP genotypes are correlated with inherit- 


ing certain human traits such as lower body-mass index. 


Box 13.1 Born to Run? 


Little Ones Get Test for Sports Gene 


New York Times, November 29, 2008 

www.nytimes.com/2008/1 1/30/sports/3 0genetics. 
html?_r=1&emc=etal 

By Juliet Macur 

A new DNA test recently became available, which boasts 
that it offers a way to test future athletic potential. Many par- 
ents were attracted to the possibility that DNA testing might 
be able to predict the athletic potential of their toddler. The 
“sports gene” test actually searches the human genome 
DNA for the human ACTN3 gene, which encodes the alpha- 
actinin-3 protein (Figure 13.6). The R gene variant (allele) of 
the ACTN3 gene produces the alpha-actinin-3 protein in the 
fast-twitch muscle fibers that perform the powerful, rapid 
contractions needed for speed, power, and endurance. The 
X allele of the ACTN3 gene is not expressed and the alpha- 
actinin-3 proteins are not made in these cells. 

To find out how the different forms of the human ACTN3 
gene might influence the inheritance of athletic traits, the 


scientists determined which combination of ACTN3 alle- 
les (R, R or R, X or X, X) were inherited by 429 top athletes, 
including 50 Olympians (see Figure 13.6). Then the scientists 
looked for correlations between the athlete’s ACTN3 allele 
genotype and proven athletic performance. The results of this 
study showed that people who inherit either the (R, R) or (R, 
X) combinations of alleles have an advantage in terms of nat- 
ural power and endurance, but those inheriting the X, X allele 
apparently do not exhibit enhanced athletic traits. 

The ACTN3 sports gene test was first marketed in 2004 
in Australia, Europe and Japan by an Australian company, 
Genetic Technologies, and became available in the United 
States through Atlas Sports Genetics a few years later. The 
DNA sample is processed by the Atlas Sports Genetics lab for 
genotype analysis at a cost of $149 per test (2008). In a few 
weeks the DNA test results are sent to the customer by mail, 
along with information suggesting which sports are most 
appropriate for the child to pursue to help the child reach his 
or her full athletic potential. 
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Box 13.1 Continued 
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FIGURE 13.6 Human “sports gene” ACTN3 encodes the alpha-actinin-3 protein in muscle cells. (A) DNA variations in the ACTN3 
gene on chromosome 11 affect inherited athletic traits. (B) The R allele (version) of the ACTN3 gene produces a muscle protein that has 
an important role in the function of fast-twitch muscle fibers. The R,R or R,X genotypes are beneficial for both power and endurance 
athletes, but the X,X alleles of the ACTN3 gene do not confer an athletic advantage. (C) The frequencies of the three ACTN3 genotypes 
(RR, RX or XX) in the population were determined to see if there is a correlation between specific ACTN3 alleles and people who are 


athletes with proven endurance or power skills. 


designed to enhance the use of genomic information by 
individuals and healthcare professionals. As commercial 
genome sequencing and related DNA services become 
more popular, quality control will become increas- 
ingly important; imagine the damage caused by a mis- 
taken genetic diagnosis of a terrible, fatal disease like 
Huntington’s (see Chapter 10). The FDA could poten- 
tially be responsible for approving products associated 
with genome analysis services and specific medical 
applications. Genome privacy issues must be balanced 
with the ever present need to increase corporate profits. 


DRUG DISCOVERY AND DEVELOPMENT 


Research in many areas of pharmaceutical biotech- 
nology provides critically important information to 
help scientists to design and produce new and better 
medicines to treat disease. This research greatly con- 
tributes to our understanding of the human genes that 
are involved in cancers, heart disease, and many other 


diseases, just as research in these areas contributes to 
advances in pharmaceutical biotechnology. The def- 
inition of a drug is any chemical that is used for the 
diagnosis, cure, treatment, or prevention of diseases 
in humans and animals. There are many types of drugs 
including some drugs that act by physically binding 
to many different targets in the cells. The first step in 
developing an effective new drug often begins with 
studying the biochemical mechanisms involved in the 
specific disease process the drug will be used to treat. 
Most of the biochemical reactions in the cell, includ- 
ing disease processes, involve the actions of protein 
enzymes. Many protein-protein interactions can result 
in the assembly of large protein complexes contain- 
ing many smaller proteins (Figure 13.7). When a pro- 
tein binds to a drug, the activity of the protein can be 
enhanced or inhibited by the interaction, which can 
block the disease process (Figure 13.8). The goal of this 
research is to find out how a given disease affects the 
cell’s mechanisms, looking for key proteins that might 
be possible targets for the action of the drug. A drug 
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FIGURE 13.7 Proteins bind to other proteins in the cell. (A) Proteins 
bind to other proteins to form everything from dimers with two pro- 
teins to the assembly of multi-protein complexes that often contain 
many protein subunits. Proteins bind to each other if the surfaces of 
the proteins have the appropriate chemical structures and shapes to 
fit together in three dimensional space. (B) The trypsin enzyme pro- 
tein (green) is shown bound to the bovine trypsin inhibitor protein 
(blue). These two proteins bind to each other by interacting through 
special regions of both proteins (red circle). 


FIGURE 13.8 Chemical is docked in the protein active site. The 
chemical compound (molecular stick structure) interacts with the 
protein by forming chemical bonds with specific amino acid side 
chains in the active site of the enzyme, blocking enzyme activity 
(see Chapter 3). 
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generally works best when it has a high binding affinity 
for its target protein because this avoids nonspecific 
binding of the drug to random cellular components 
that are not part of the disease process. 

Receptor proteins are often used as the targets for 
drugs because receptors are exposed on the surfaces 
of the cells where they bind to specific proteins in the 
cell’s environment. This means that is it important for 
scientists to identify the functions of proteins involved 
in the disease being treated. An unknown recep- 
tor protein can often be identified by comparing the 
amino acid sequence of the unknown receptor protein 
with databases of proteins with known functions (see 
Chapter 7). These structure-based approaches to drug 
design involve combining sequence information from 
bioinformatics with structural evidence from x-ray crys- 
tallography and nuclear magnetic resonance (NMR) 
spectroscopy studies. These results determine the 
molecular composition and three-dimensional structure 
of the unidentified protein. The information about the 
biological target molecule including the DNA and pro- 
tein sequences, genetic variation, genetic maps, gene 
and protein expression data, and the protein structure 
results are used to devise biochemical and genetic 
screens to find effective new drugs. 

The structural approach to drug design is based on 
understanding the molecular structures of the biologi- 
cal drug targets, with special attention paid to regions 
that interact with the active sites on the enzymes. 
Other drugs are specifically designed to inhibit key 
physical interactions between a protein and its target. 
This approach has potential, but engineered drugs 
sometimes do not bind to their targets as predicted 
when tested, probably because the engineered mole- 
cules are designed to bind to the static structure of the 
target protein, and not to the moving, “living” protein 
that changes shape in the cell. How a protein changes 
conformation in the cell is an important factor that 
must be considered by scientists working to improve 
drug design. 

High throughput screening for new drugs starts with 
special computer software that creates structural mod- 
els of a drug molecule using the information gained 
from studying interactions between the drug and estab- 
lished biological targets. When the structure and func- 
tion of the active site of the receptor protein are well 
understood, scientists can use computer programs to 
design new drug molecules predicted to bind tightly 
to the target protein (Figure 13.9). Computer docking 
programs are used to test large libraries of chemicals 
for the ability to bind to the biological target, followed 
by other tests to assess the specificity and strength of 
the drug binding to the target protein. This method is 
used to identify molecules that interact accurately with 
the chosen target and do not bind nonspecifically to 
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FIGURE 13.9 The HIV protease enzyme is essential for an AlDs 
infection to develop. The active site of the HIV protease enzyme 
looks like a three-dimensional pocket in the protein structure. A 
drug designed to fit inside the active site has the potential to block 
the activity of the enzyme, stop the HIV virus and treat the disease. 
The x-ray crystallography studies show the molecular interactions 
between the HIV protein and the atoms of the drug (Kaletra): carbon 
atoms (gray), nitrogen (blue), and oxygen (red). 


similar molecules. One example is an AIDS drug that 
was designed with the right shape and other chemical 
characteristics so that it fits inside the active site of the 
HIV protease enzyme, much like a key fits into a lock. 
X-ray crystallography studies reveal the snug molecular 
interactions between the HIV protein and the atoms of 
the drug (see Figure 13.9). 

Research is ongoing to determine how amino acid 
chains fold into three-dimensional protein structures 
that confer specific functions. Additional research is 
necessary for scientists to better understand the relation- 
ship between a three-dimensional protein structure and 
the biochemical activity of the protein. In 2007 for the 
first time a team of scientists (University of California, 
San Francisco, Albert Einstein College of Medicine, 
Texas A&M) determined the biological function of an 
enzyme based only on the three-dimensional struc- 
ture of the protein. The scientists used computer-aided 
modeling and molecular docking to search a database 
of protein sequences for potential drug-protein interac- 
tions that block the action of the enzyme. 

It is possible to use computer-aided molecular dock- 
ing when the researchers know the atomic structure of 
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the “active site” of a specific enzyme. The computer 
searches through many thousands of different mole- 
cules, checking each one to look for the single mole- 
cule that fits tightly into the three-dimensional shape of 
the empty “active site” in the enzyme. Earlier compu- 
ter approaches to find specific enzyme substrates were 
often unsuccessful because they relied on only stable 
protein structures. However this new research accom- 
plished an amazing computational feat by producing 
simulated candidate substrates that accurately mimic 
the molecular structures of the unstable “intermediate” 
protein conformations. These transient protein structures 
exist only very briefly during the biochemical reaction 
as the enzyme binds to the substrate and transforms 
the substrate into the product molecule. The interme- 
diate molecular structures are very unstable and are 
not detected by tests designed to find out how protein 
structures fit into the active site of the enzyme. In some 
cases the fit between the candidate substrate molecule 
and the enzyme active site can be predicted by the 
results of the computer molecular docking experiments, 
and confirmed by studies to determine the atomic struc- 
ture of the substrate through x-ray crystallography. 


Some new drugs are synthetic molecules designed to 
fit into the active site of the key enzyme. This blocks the 
action of the enzyme and prevents the disease process 
from continuing to damage the cells. 


Drug Design and Personalized Medicine 


The National Institutes of Health (NIH) sponsors 
research in the United States to determine why indi- 
viduals have very different physical reactions to certain 
medicines. For example, some people get side effects 
but no pain relief from common painkiller medica- 
tions, and certain allergy and asthma drugs work well 
for some people but are not at all effective for others. 
In 2008, millions of people in the United States risked 
an overdose reaction to taking standard amounts of a 
medicine that is commonly used to prevent blood clots. 
This NIH-supported research is focused on improving 
the health of all Americans by studying human cells and 
finding out what goes wrong when the human body 
experiences disease or injury. This research is directed 
at optimizing the positive effects of common, often 
essential, medications while preventing allergic reac- 
tions and other serious side effects. 

Different people often respond to common drugs dif- 
ferently, including medicines that lower cholesterol lev- 
els, or treat cancer, or anti-AIDS medications. In many 
cases these diverse reactions can be linked to natu- 
rally occurring variations in individual human genome 
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DNA sequences. These genome DNA differences influ- 
ence how individual people respond to different drugs. 
Eventually results from this research will be used rou- 
tinely by health care providers to prescribe the most 
effective medicines and treatment for individual patients. 
They focus on the liver, the key organ responsible for 
metabolizing the wide array of drugs and toxins that 
enter the human body. But liver function varies among 
people. For example, some liver enzymes act on pain- 
killer drugs to convert the drug into an active form inside 
the body. The proteins that activate painkillers exhibit 
varying levels of function in different individuals, which 
correlates with DNA variation in individual human 
genomes. 

Scientists have analyzed the genomes of people who 
have unusual reactions to common drugs, in order to 
identify variations in the DNA sequences of genes that 
are potentially involved in these atypical reactions. 
Studies designed to correlate diseases with sequence var- 
iations between individual human genomes will provide 
important information to help doctors prescribe the right 
medicine for each patient. In the future, personalized 
medicine will make drugs more effective and less expen- 
sive, and will decrease the unnecessary side effects that 
often result from one-size-fits-all medical treatment. 


Research in drug development requires a detailed under- 
standing of the biochemical mechanisms involved in the 
disease process. Protein-protein interactions play a central 
role in most diseases and often reveal successful targets for 
drug intervention and disease treatment. 


Modular Drugs: Leukemia Drug Shows 
Early Promise 


A new type of bioengineered medicine called a modu- 
lar immune-pharmaceutical drug was recently shown 
to be effective for the treatment of chronic lymphocytic 
leukemia (CLL) in lab animals. The new antileukemia 
drug, called CD37-SMIP (Trubion Pharmaceuticals, 
Inc.) binds to the CD37 proteins located on the sur- 
faces of the leukemia cells. The binding of the CD37- 
SMIP drug to the CD37 protein sends a signal into the 
leukemia cell, which triggers programmed cell death 
(apoptosis). This process of cell suicide is used by the 
body for injury repair and to eliminate unnecessary 
cells during embryo development (see Chapter 9). 
Treatments with the new leukemia drug, CD37-SMIP, 
proved to be as effective as the standard anticancer 
drug, rituximab, which binds to a different target pro- 
tein on the surface of the leukemia cells (Figure 13.10). 

This new drug might be effective to treat other types 
of cancers that also express the CD37 protein on the 
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FIGURE 13.10 Some anticancer drugs bind to target proteins on 
the surface of the cancer cells. Rituximab is an anticancer drug 
developed from a monoclonal antibody (“Y”) that binds to a spe- 
cific target protein on the surface of leukemia cells. In combina- 
tion with chemotherapy, Rituxan has helped to treat patients with 
non-Hodgkin’s lymphoma, a cancer that occurs in about 20 out of 
100,000 people. 


cell surface, including non-Hodgkin’s lymphoma and 
acute lymphoblastic leukemia. The cell suicide mech- 
anism triggered by the new anticancer CD37-SMIP 
drug differs from other anticancer drugs that induce 
apoptosis by triggering a cascade of caspase enzymes. 
This new type of anticancer agent could be an effec- 
tive treatment for patients who have become resistant 
to other anticancer drugs, because the function of 
the CD37-SMIP drug is independent of the caspase 
enzyme pathway. 


Teaching an Old Drug New Tricks 


Scientists also design effective new drugs by analyzing 
the mechanisms of less effective medications, which is 
one alternative approach to screening cells with thou- 
sands of small molecules to look for a specific bio- 
chemical response or protein activity. They decided 
that it might be possible to uncover new drug activities 
hidden within the structure of a drug commonly used 
to treat a different malady. A research team of computa- 
tional structural biologists at Scripps Research Institute 
(La Jolla, California) focused on prostate cancer treat- 
ments using drugs that block the function of the andro- 
gen hormone receptor, a protein that plays an important 
role in the development of many cancers. The research- 
ers started with a well-known and commonly used 
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antipsychotic drug, which they transformed into a 
potential treatment for prostate cancer. Researchers 
studied 3D computer models of the structure of the 
androgen hormone receptor and used advanced com- 
puter algorithms to predict the 3D structure and prop- 
erties of the binding pocket in the receptor protein. 
Information gathered from a large collection of binding 
pocket models, enabled the computer docking program 
and virtual screening functions to detect molecules that 
bind inside the active site pocket and specifically block 
the function of the receptor protein. This approach 
recently revealed inhibitors of other hormone receptors 
and proteins including the retinoic acid receptor, the 
epidermal growth factor receptor (EGFR), the anthrax 
lethal factor, dynamin, and alpha-1-antitrypsin. 


Advances in the use of computers to analyze protein structure 
and predict accurate three-dimensional structures have greatly 
enhanced the ability of researchers to design new drugs effec- 
tive for the treatment of many different human diseases. 


LAB ON A CHIP TECHNOLOGY 


Tiny handheld lab on a chip devices are under devel- 
opment in many labs, and will soon permit physicians, 
crime scene investigators, pharmacists, and the general 
public to quickly conduct inexpensive tests on DNA 
and other biological compounds, anywhere, anytime. 
Scientists predict that within a decade the public will 
be able to purchase DNA test kits at pharmacies that 
are marketed for use at home, allowing people to self- 
test for a variety of diseases, even before the patient 


DNA and Biotechnology 


has developed any symptoms. The handheld lab-on-a- 
chip devices are about the size of a microscope slide, 
and will house a variety of miniature analytical labo- 
ratory tools (Figure 13.11). The handheld lab requires 
tiny reaction volumes that are thousands of times 
smaller than those used in a normal lab, which allows 
the reactions to proceed at rates that are 100 times 
faster than in a traditional lab. 

In 1998 Hewlett-Packard Company and Caliper 
Technologies Corp. announced a collaboration to fur- 
ther develop lab-on-chip technology based on the orig- 
inal studies done at Oak Ridge National Laboratories, 
the University of Alberta, and Ciba Geigy. Caliper 
Technologies developed practical applications for 
microfluidics technology that represents a major 
advance in this technology. The Caliper scientists 
etched tiny integrated biochemical processing circuits 
into glass, silicon, quartz, or plastic supports to create 
channels about 80 micrometers wide and 10 microm- 
eters deep. These lab chips perform the same protocols 
as conventional instruments, but the lab chip uses very 
tiny amounts of sample and performs at speeds that are 
much faster than usual because the fluids move at very 
high rates (Figure 13.12). 

Voltages are applied to various channel intersections 
to direct the sample in the fluid channels throughout 
the chip, adjusting concentrations across three orders of 
magnitude, separating components, adding fluorescent 
tags, and sending the sample past detection devices for 
digital output. The microchip lab devices are designed 
to be flexible and in the future will offer an array of 
multipurpose workstations, with chips on hand to 
convert an instrument from one function in the morn- 
ing to a different one in the afternoon. In combination 
with DNA on a chip and microarray technology, the 
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Lab on a chip technology. (A) Scientists in a traditional molecular biology lab studying DNA and proteins will set up experi- 


ments using micro-pipets and involving tiny reaction volumes that take place in small plastic microfuge tubes. The idea behind the lab-on- 
a-chip devices currently under production is to miniaturize the traditional experimental protocols and reproduce them on a micro chip. The 
lab-on-a-chip devices will rapidly determine the chemical composition of substances as well as perform DNA testing and more. 
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science of microfluidics has the potential to dramati- 
cally improve the accuracy and speed of current diag- 
nostic tests (see Figure 13.12). In the future, handheld 
lab devices will be used by health care professionals 
to screen for serious diseases and for rapid testing for 
an array of infectious diseases, including HIV, anthrax, 
the avian flu, the swine flu (H1N1 flu), and some can- 
cers and genetic defects. One lab on a chip is being 
designed to travel to Mars to test for proteins that signal 
possible past life forms on the red planet (Figure 13.13). 


Lab on a chip technology allows the microchip to perform 
the same protocols as conventional instruments, except 
much faster and using much smaller samples. 


Lab-on-a-chip technology is part of the trend 
toward “personalized medicine,” in which health care 
is increasingly tailored to the specific genetic profile 
of each patient. Highly specialized personalized care 
allows physicians to design therapies and prescribe 
specific medications for patients while also consider- 
ing the genetic profile of each patient. Together with 
simplified genetic testing, the lab-on-a-chip tech- 
nology will reduce the overall cost of DNA testing 


FIGURE 13.12 A mini-lab on a chip can do it all. The Agilent 
Bioanalyzer (Agilent Technologies) carries out highly reproduc- 
ible chemical reactions and DNA manipulations by utilizing a net- 
work of channels and wells etched into glass or polymer chips that 
accommodate very small sample and reagent volumes. Tiny changes 
in pressure or electro-kinetic forces propel the tiny volumes of flu- 
ids through the channels in a highly controlled process that depends 
on the desired test. The lab-on-a-chip can handle multiple samples 
without making an error while it performs tasks such as mixing or 
diluting reagents, resolving DNA, RNA and proteins by electro- 
phoresis or chromatographic separation, and tags to provide visual 
detection. 
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and will help to spread the idea of personalized care 
based on individual genetics. As this microtechnol- 
ogy is improved and refined, the costs of producing 
the handheld devices will go down. More people and 
health care groups will be able to afford the purchase 
price and the handheld labs will be adopted for use in 
a wide variety of clinical settings. 

The diagnosis of human disease is complicated. For 
example, a magnetic resonance imaging (MRI) scan of 
the brain is one way to diagnose brain health, but even 
an MRI cannot reveal the health of individual nerve 
cells (Figure 13.14). In many cases, early diagnosis of 
a disease significantly improves the patient’s chances 
of recovery, especially in the case of early cancer diag- 
nosis. Eventually the technology of “proteomic finger- 
printing” will be used to routinely reveal the protein 
expression patterns in healthy and diseased cells and 
will provide specific tests to rapidly assess the health 
of tissues and organs in an individual patient. Similar 
lab on a chip technology will be customized for use 
by people in law enforcement and forensic investiga- 
tors who need to gather information quickly and often 
need to use small biological samples. The handheld 
nanolab will analyze tiny samples of blood or semen 


FIGURE 13.13 Lab on a chip to visit Mars. This lab on a chip 
device is designed to fly to Mars in several years to look for signs 
of life (UC Berkeley). The chip has a microcapillary electrophoresis 
system that can determine the composition and “handedness” of any 
amino acids found on Mars. As chemicals, amino acids can exist 
as either left or right handed molecules, but on Earth living organ- 
isms contain only left handed amino acids. For this reason scientists 
need to determine if any amino acids found in space are left- or 
right-handed. 
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FIGURE 13.14 MRI scans are one way to diagnose brain health and nerve function. These colorful pictures are magnetic resonance imaging 
(MRI) scans of a healthy human brain. MRI is a common method used to diagnose certain illnesses but requires a trip to the hospital or clinic. 
Scientists are developing a large array of tests for people to take at home in the future that would potentially check the health of every organ 
in the human body including the brain. A similar device could be used to screen people for infection during a flu epidemic or other infec- 
tious health emergency. 


Box 13.2 Nanotubes Help Advance Brain Tumor Research 


Science Daily, January 2008 

Nanotechnology promises to revolutionize the future of certain 
cancer therapies. Researchers at City of Hope have delivered can- 
cer-fighting agents into the cells using a new type of carbon nano- 
tube carrier. The carbon nanotubes are hollow cylinders made by 
rolling sheets of graphite-like carbon into tubes that are 50,000 
times narrower than a human hair, and can be several centim- 
eters long [Nano and Micro Systems Groups, Jet Propulsion Lab 
(JPL)]. These new carbon nanotubes efficiently carried drugs, 
DNAs, and siRNAs into cells, were not toxic to the brain cells of 
mice, and did not inhibit or promote cell reproduction. 

JPL scientists and City of Hope doctors decided to 
collaborate on developing nanostructures to better diag- 
nose and treat brain cancer. This nanomedical research col- 
laboration focuses on attaching inhibitory RNA molecules to 
the nanotubes, which will be delivered into specific target 


regions of the brain. If the nanotube delivery technology can 
be effectively used to treat brain tumors, the same approach 
might also work for treating stroke, trauma, nerve disorders 
and other diseases that affect the brain. 

Carbon nanotubes are extremely strong, flexible, and 
heat-resistant structures, making them excellent for use as 
field-emission cathodes, which help produce electrons, and 
for various aerospace applications involving x-ray and mass 
spectroscopy, vacuum microelectronics and high-frequency 
communications. “Nanotubes are important for miniaturizing 
spectroscopic instruments for space applications, developing 
extreme environment electronics, as well as for remote sens- 
ing,” said Harish Manohara, a JPL supervisor. Nanotubes are 
not usually part of current NASA space missions, but they 
could be used in gas-analysis or mineralogical instruments for 
future missions to Mars, and other planets. 
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right at a crime scene and the results will be imme- 
diately compared to a genetic database, with the 
intent of identifying a suspect soon after the crime. 
Applications of nanolab technology would also impact 
many areas of agricultural biotechnology, including 
rapid genetic tests performed on thousands of hybrid 
plants to screen for the desirable traits such as resist- 
ance to disease and drought. 


NANOTECHNOLOGY 


Nanotechnology research takes advantage of the ability 
to manipulate atoms and molecules to create new mate- 
rials and nano-devices. Certain metallic and carbon- 
based nanometer-sized structures exhibit novel physical 
and chemical properties when they are produced in 
miniature. Examples of these dramatic property changes 
include opaque substances that are transparent nanos- 
tructures (copper), inert materials that become potent 
chemical catalysts (gold) and insulating materials that 
transform into efficient nanoconductors (silicon). The 
novel nanoscale properties of these tiny structures have 
the potential to produce a wide variety of useful nanos- 
cale products with many potential applications, includ- 
ing pharmaceutical drug design and miniature devices 
developed for use in medical diagnosis and treatment. 
Nanotechnology research focuses on developing effi- 
cient and affordable ways to assemble carbon-based 
molecules into novel nanostructures. 

The field of nanotechnology focuses on the creation 
and manipulation of nano-sized particles, with diam- 
eters of 100 nanometers or smaller, to create materials 
or devices for industrial and medical applications. The 
upper size limit in nanotechnology is 100nm, much 
smaller than bacteria or human cells (Figure 13.15). 
One nanometer (nm) is the same length as one-billionth 
of a meter (107° of a meter), much too small to be seen 
with a conventional light microscope (see Figure 13.15). 
To put these tiny dimensions in better perspective, the 
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distances between adjacent atoms in a molecule range 
between 0.12nm and 0.15nm, just over a tenth of a 
nanometer. In the context of biologically important mol- 
ecules, the diameter of the DNA double-helix molecule 
(double-stranded DNA) is 2 nm. Cells range in size from 
tiny bacteria (about 200nm long) to the large eggs of 
birds and dinosaurs and including a typical animal cell 
that averages about 10 um in length. 


The Amazing Science of Carbon 
Nanotechnology 


The science behind carbon nanotechnology followed 
from the discovery of the buckminsterfullerene (C60) in 
1985 (Figure 13.16). Some chemical elements such as 
carbon can exist in two or more different structures, or 
allotropes, of that element. The “bucky ball” allotrope 
is a soccer ball-shaped structure containing 60 carbon 
atoms with one carbon atom positioned at each cor- 
ner of the 20 hexagons and 12 pentagons. This specific 
configuration of ordered carbon atoms makes an 
extremely stable structure (Figure 13.17). The earliest 
carbon nanotubes were structurally imperfect, and as a 
result they did not exhibit particularly interesting prop- 
erties. In 1990, researchers found that the nanobucky 
ball C60 structure could be produced in an arc evapo- 
ration apparatus, which eventually led to the use of this 
method to create fullerene-related cylindrical carbon 
nanotubes. Certain carbon-based nano-sized materials 
exhibit intriguing physical and chemical properties that 
change dramatically as nanostructures, such as chang- 
ing from opaque to transparent, converting inert mate- 
rials into potent chemical catalysts, and transforming 
insulating materials into efficient nanoconductors. 

To make a carbon nanotube, a graphite sheet is 
rolled into a nanocylinder, so that the pattern of hexa- 
gons arranged around the circumference of the cylin- 
der reflects specific vector indices (n,m) (Figure 13.18). 
For example, to produce a nanotube with the indices 
(6,3), the graphite sheet is rolled up so that the atom 
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FIGURE 13.15 Relative sizes of cells and sub-cellular components. The scale shown compares a range of sizes of various cells and cellular 
components (DNA helix, virus, bacterial cell, animal cell and plant cell). The bar below the ruler indicates the overlapping size ranges of 
objects that are visible with a light microscope and an electron microscope. The upper size limit for nanotechnology is 100 nm. 
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FIGURE 13.16 Different structures (allotropes) are made by different bonds between carbon atoms. (A) Diamond. (B) Graphite. (C) Fullerene. (D) 
Nanotube. Colors in (A), (C), and (D) represent depth, with red closest to the viewer, and violet (or blue for [C] and [D]) farthest away. 
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FIGURE 13.17 Geodesic domes resemble buckeyballs. (A) In 1985 a new allotrope of carbon called buckminsterfullerene (C60) was discovered 
and named after American architect and designer Richard Buckminster “Bucky” Fuller (1895 — 1983). The 60 carbon atoms form the shape of a 
soccer ball with a carbon atom at each corner of 20 hexagons and 12 pentagons; (B) A geodesic dome home designed by architect Buckminster 


“Bucky” Fuller. 


FIGURE 13.18 Amazing carbon nanotubes often have unusual and 
unexpected properties. 


labeled (0,0) is superimposed in the cylinder on top of 
the atom labeled (6,3). The extreme light weight and 
strength characteristics of the tiny carbon nanotube 
cylinders are a direct result of the physical and chemi- 
cal properties conferred by the nano-size structure. 
The nanotube structures are called “zig-zag” (m = 0), 
“armchair” (n = m), and the chiral tube, which can 
exist in one of two mirror-image forms (Figure 13.19). 
Modern carbon nanotubes have either one wall 
(single-walled) or more than one wall (multiwalled), 
but the earliest nanotubes were multiwalled cylinders 
with an outer diameter measuring from 3 nm to 30nm 
(Figure 13.20). In 1993, single-walled carbon nano- 
tubes were created, with narrower diameters (1 to 2 nm) 
than the multiwalled nanotubes. The ability to con- 
struct the new single-walled carbon nanotubes was 
an important advance in the field because the single- 
walled nanotubes exhibit novel electric properties that 
are not exhibited by the multiwalled carbon nano- 
tubes. For example, the single-walled nanotubes make 


Chapter | 13 Pharmaceutical Biotechnology 


(A) (B) 
FIGURE 13.19 


303 


(C) 


Single-walled carbon nanotubes. (A) Armchair wiki. (B) Zig zag wiki. (C) Chiral nanotube. 


FIGURE 13.20 Single and multilayered carbon nanotubes visualized using an electron microscope. (A) Multiwalled nanotubes. (B) Single- 
walled carbon nanotubes. These new fibers exhibit a range of exceptional properties that prompted a surge of research into the new carbon 
nanotubes. 


excellent electrical conductors and play a central role 
in the first intramolecular field effect transistors. Single- 
walled nanotubes are also the best candidates for the 
construction of miniaturized electronic devices that 
can operate routinely at nanoscale dimensions inside 
human cells. 

The much desired single-walled nanotubes are 
expensive to produce, but work is under way to find 
affordable synthetic methods that will allow the routine 
use of carbon nanotube technology in future applica- 
tions. Commercial applications of discoveries made in 
nanotechnology have emerged slowly, partly because 
of the increased production costs of high-quality nano- 
tubes. Still, the amazing characteristics of carbon nan- 
otubes give this area of nanotechnology great potential 
for future applications in high-technology fields that 
require lightweight, extremely strong materials, with 
nanocharacteristics such as superconductivity. 

Interestingly, for the first time two different carbon 
allotrope structures, a fullerene and a nanotube, were 
combined to make a single nanobud structure that exhib- 
its some properties from both fullerenes and carbon nan- 
otubes (Figure 13.21). The fullerene-like “bud” is bonded 


to the outer wall of the carbon nanotube, which greatly 
strengthens the mechanical properties of the composite 
structure and makes an effective molecular anchor that 
prevents the attached nanotube from moving. 


Clean Energy: Carbon Nanohorns 
Store Hydrogen 


The hydrogen-carbon bonds that form the cone-shaped 
nanohorn structures are much more stable than the 
hydrogen-carbon bonds in the nanotube structures, 
making nanohorns an excellent option for industrial 
and medical applications of carbon-based nanomateri- 
als (Figure 13.22). Nanohorns not only have remark- 
able adsorptive and catalytic properties, but French 
researchers discovered that carbon nanohorns offer 
an efficient and inexpensive way to store hydrogen in 
hydrogen-powered fuel cells, a source of clean energy. 
Hydrogen is an abundant, renewable energy source 
that could replace fossil fuels, but the use of hydrogen 
is limited because it is difficult to store. However, the 
carbon nanohorns trap the hydrogen in an efficient 
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FIGURE 13.21 
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(B) 


Hybrid carbon nanostructures. (A) Nanobud. (B) This potential nanodiode was constructed by joining an “even” rolled 


graphical sheet, which has semiconducting properties, to a “spiral” rolled sheet, which is predicted to have metallic characteristics. 


FIGURE 13.22 Carbon nanohorns. Carbon nanohorns are strong 
and light, and are much more stable than other carbon-based nano- 
materials, including nanotubes. 


process that permits all of the adsorbed hydrogen to be 
recovered. 

The unique properties exhibited by carbon nano- 
structures are essential for the development of many 
novel materials with potential applications in areas 
such as medicine, electronics, and energy produc- 
tion. As expected for a new scientific field, people 
have raised questions about nanotechnology, includ- 
ing safety issues, the possible environmental impact 
of nanomaterials and the need for special federal or 
state regulation of the rapidly growing nanotechnology 
industry. 


Nanotechnology produces novel nanomaterials and nanos- 
tructures that have a myriad of different applications in 
diverse areas of science. 


NANOMEDICINE 


The applications of nanotechnology to medicine, 
called nanomedicine, includes advances in drug deliv- 
ery, imaging and diagnostics, biosensors, targeted drug 
delivery, tissue engineering, and nanotoxicology. In 
the future, molecular machines like nanorobots will 
perform the rapid detection, diagnosis and elimination 
of pathogens inside the living human body, eventually 
performing surgery on individual cells. The nanodoc- 
tors will be tiny, less than 100 nanometers (nm), the 
appropriate size needed to operate within a living 
human cell. 

The national importance of nanotechology research 
to the federal government was recognized when the 
National Institutes of Health (NIH) established eight 
Nanomedicine Development Centers in the United 
States staffed by teams of biologists, physicians, math- 
ematicians, engineers, and computer scientists. This 
NIH program provides researchers with an opportu- 
nity to combine nanoscale research with the ability to 
manipulate cellular nanostructures and design novel 
medical therapies. The scientists compiled extensive 
information on the chemical and physical properties of 
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FIGURE 13.23 DNA helix and carbon nanotube act as a biosen- 
sor device. (A) Researchers have developed a DNA-nanotube sensor 
that detects tiny amounts of metal ions in blood, tissue, and cells. 
(B) When the DNA-nanotube senses metal ions in the environment 
the DNA wrapped around the nano-tube changes helical conforma- 
tion, which in turn squeezes an infrared emission from the nanotube 
sensor. 


nanoscale cellular structures and studied the interac- 
tions between individual molecules and multiprotein 
complexes to increase our knowledge of the molecular 
structures and processes in living cells. This informa- 
tion is necessary for researchers to be able to design 
strategies to correct the structural defects and func- 
tional problems in disease cells. Scientists developed 
new nanotechnology tools to explore and manipulate 
nanoscale biological structures. 


New nanodevices will be able to perform a wide range of 
biomedical tasks in the future including tiny nanodoctors 
that will diagnosis infectious agents, detect metabolic dis- 
orders, and repair or replace defective or broken cellular 
components. 


DNA Nanotube Biosensor 


A novel DNA-nanotube sensor developed at the 
University of Illinois at Urbana-Champaign can detect 
the presence of metal ions in biological samples such 
as blood, tissues, and cells. The double-stranded DNA 
helixes are wrapped around semiconducting, single- 
walled carbon nanotubes (Figure 13.23). In the pres- 
ence of ions, the DNA helix changes conformation 
around the nanotube, which changes the infrared 
emission from the nanotube sensor. 

Researchers have developed nanotubes that are 
coated with different organic materials that allow the 
nanotube to act as a diagnostic test for lung cancer by 
sampling the molecules in a patient’s breath. The first 
such nanodevice contained networks of single-walled 
carbon nanotubes, each coated with 1 of 10 different 
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organic materials. Each of the different organic coat- 
ings on the nanotubes was designed to emit a unique 
response when the detector is exposed to more than 
200 volatile organic chemicals present in the human 
breath. The nanotube devices were calibrated using 
breath from 15 nonsmoking healthy people and 15 
individuals with stage 4 lung cancer. The organic 
compounds in each breath sample were concentrated 
and identified by gas chromatography and mass spec- 
trometry. The same samples were then tested using 
the nanotube sensor array and the results compared. 
The electrical output of the nanotube array varies in 
response to the exact mixture of organic compounds 
in the breath. The researchers found they could reliably 
distinguish between patterns of the healthy and lung 
cancer patients using the nanotube sensor device. 


The amazing products made by carbon nanotechnology 
have great potential for many applications in diverse fields 
that require lightweight, extremely strong materials with 
unusual electrical or magnetic properties. 


Carbon Nanoparticles Make New 
Delivery Systems 


The fight against cancer now includes the field of nanote- 
chnology. Advances in nanotechnology offer a new way 
to overcome problems limiting the use of platinum-based 
anti-cancer drugs. Because of delivery problems, these 
strong drugs often become inactive in the body long 
before reaching the tumor. Researchers (Massachusetts 
Institute of Technology, Stanford University) made special 
carbon nanotubes that function as a “longboat delivery 
system” designed to protect the platinum-based anti- 
cancer drugs as they are carried through the body to the 
tumor (Figure 13.24). 

The researchers attached the platinum drug to a 
single-walled carbon nanotube and created a longboat 
delivery system that efficiently transports the “warhead” 
to the tumor and releases the active platinum drug. 
Tests in cultured human cells showed that the nanotube 
longboat delivery system resulted in platinum levels that 
were significantly higher than when the platinum drug 
was administered by injection. The new carbon long- 
boats can also transport other types of cargo to other 
target body sites in addition to the anticancer drugs. 

Researchers have developed special nanoparti- 
cles designed to target cancer cells after metastasis 
when the cancer cells have left the primary tumor and 
spread in the body, making them difficult to remove by 
surgery. Special nanoparticles with gold shells and car- 
rying mica absorb radiation and generate heat, effec- 
tively killing the thermal sensitive tumor cells. The 
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effectiveness of this new system was tested in mice 
and shown to kill the cancer cells without damaging 
the surrounding healthy tissues. Researchers in Japan 
used a similar approach to treat tumor cells in mice by 
filling the carbon nanohorn structures with a photody- 
namic agent, zinc phthalocyanine. After exposure to a 
laser beam for 15 minutes each day for 10 days, the 
tumors disappeared. 

Cells use many approaches to control which genes 
are transcribed including an RNAi mechanism that 
blocks the expression of some genes (see Chapter 11). 
The RNAi control mechanism relies on small inter- 
fering RNA molecules (siRNA) that base pair to spe- 
cific target messenger RNA molecules (mRNA) in the 
cells, which block synthesis of that protein. Specially 
designed small interfering RNA molecules can block 
protein production in mammalian cells and, as a 
result, stop the growth of cancers and other diseases. 

To use siRNA therapy to treat disease, researchers 
must overcome the dangers that face the RNA mol- 
ecules in the body, where they are easily destroyed. 


FIGURE 13.24 Carbon nanolongboats help fight cancer. Nanotube 
longboats can carry agents to fight cancer cells. 
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This complication is avoided by using nanoparticles 
designed to transport the siRNA molecules directly 
into the target cells as described for the anticancer 
drugs. To accomplish this feat, the outer surfaces of the 
nanoparticles are covered with special proteins that 
help to deliver the nanoparticles to the appropriate 
cells. The target cells are covered with proteins that 
bind to the surface proteins on the nanoparticles. Once 
inside the cells, the siRNA molecules bind to the target 
mRNAs and prevent synthesis of the targeted protein. 


Nanotechnology Powered by Biomotor 
Proteins 


Human cells produce a variety of molecular machines 
that perform essential functions in many complicated 
biochemical processes such as metabolism, cell divi- 
sion, intracellular transport, and cell signaling. Cell 
components are often in motion in the cell, powered 
by special biological motor proteins. Scientists noticed 
how the kinesin and myosin biomotor proteins move 
along fibers in the cell, suggesting that the molecular 
motors could be made to perform similar functions out- 
side the cell (Figure 13.25). Biomotors are the ultimate 
cellular nanomachines. They do mechanical work by 
hydrolyzing adenosine triphosphate (ATP) to generate 
energy. The ATP hydrolysis reaction changes the three- 
dimensional shapes of the motor proteins, which con- 
verts the chemical process into mechanical work that 
drives the movements of the proteins (see Figure 13.25). 


FOUR-DIMENSIONAL MICROSCOPE 
REVOLUTIONIZES OUR VIEW 


The need to visualize nanoscale events as they occur 
has increased dramatically with the rapid develop- 
ment of nanotechnology. However, scientists recently 


FIGURE 13.25 Kinesin motor proteins assemble into a molecular machine that walks along fibers in the cell. (A) The kinesin motor machine 
hauls cargo in precise 8-nanometer steps along the fibers made up of tubulin proteins (microtubules). (B) Researchers recently discovered that 
the kinesin motor machine uses its two heads to walk in an asymmetric, hand-over-hand fashion. 
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visualized nanoevents caught in real time and in real 
space. The 4D electron microscope produces “movies” 
of the atomic changes that take place. The new tech- 
nology is based on the traditional electron microscope, 
which bounces a stream of electrons off of objects and 
produces a static image with a resolution of over a bil- 
lionth of a meter in length. In this case electrons are 
used to visualize molecules because the wavelength of 
the radiation source of a microscope must be shorter 
than the distance between the atoms in the sample. 

The research group that developed four-dimensional 
(4D) electron microscopy at the California Institute of 
Technology was directed by Ahmed Zewail, who won 
the 1999 Nobel Prize in chemistry for his work in fem- 
tochemistry (one-quadrillionth or 10 to the minus 15"), 
4D electron microscopy uses ultrashort flashes of laser 
to observe the behavior of atoms during fundamen- 
tal chemical reactions, permiting scientists to watch 
atoms come together to form new molecules, captur- 
ing motion on a femtosecond timescale (one millionth 
of a billionth of a second). The team used ultrafast “sin- 
gle-electron” imaging, where each electron trajectory 
is precisely controlled in time and space. The image 
produced by each electron represents a femtosecond 
snap shot of events at that moment in time. Much like 
the still frames in a film, the millions of the sequential 
images generated over time are assembled into a digital 
movie of events at an atomic scale. Zewail compared 
this new high-resolution technology to freeze-frame 
still photographs taken in the 19th-century, which 
required use of a strobe light, and proved for the first 
time that a galloping horse lifts all four hooves off the 
ground at the same time. 

Modern researchers used 4D electron microscope 
technology to visualize the movement of carbon atoms 
in material that was heated rapidly, causing the heated 
carbon atoms to vibrate in a random, nonsynchronized 
manner. Over time, the vibrations of the individual atoms 
synchronize and exhibit a heartbeat-like “nanodrum- 
ming,” a mechanical phenomenon that resonates at a fre- 
quency that is much higher than detected by the human 
eardrum. Zewail used the new 4D electron microscopy 
to observe atoms in superthin sheets of gold and graphite. 


Researchers are using the 4D microscope to view new 
images of cellular components in live cells made by fol- 
lowing the dynamic changes in complex cellular structures 
in real space and time, providing a new perspective on the 
functions of molecules and cells. 


DNA COMPUTERS 


In 2003 the world celebrated the 50th anniversary of 
the discovery of the DNA double-helix structure by 
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James D. Watson, Francis H. Crick, Rosalind Franklin 
and Maurice Wilkens. Their work explained genetic 
events in terms of the DNA helix molecule and opened 
the doors to the next century of biological study of cells. 
Many thousands of researchers continue working to 
decipher the diverse ways that genes and DNA control 
the development and functions of living organisms. The 
applications involving the extraordinary DNA molecule 
extend way beyond uses in cell and molecular biology. 
In part this was due to rapid advances in DNA technol- 
ogy, for example, the modern DNA synthesis machines 
routinely produce long DNA molecules with predeter- 
mined DNA sequences, opening the door to many new 
applications of DNA in many diverse areas of science. 

In 1994, Leonard M. Adleman (University of Southern 
California) built the first DNA computer that can actu- 
ally play the game of tic-tac-toe! Since then, scientists 
have developed special DNA molecules that can per- 
form the logic operations that are typically carried out 
by a silicon-based computer (Figure 13.26). This DNA 
computer consists of a series of wells containing DNA 
molecules called logic gates. The logic gate responds 
to input information in specific ways depending on the 
secondary structure (internal base-pairing) of the DNA 
strand. This system, called a molecular array of YES and 
AND gates (MAYA), can play a restricted game of tic-tac- 
toe. The next generation of DNA computer, MAYA-II, 
allows an unrestricted game where the player can make 
a move in any well (logic gate) after the computer has 
made the first move in the center square. 

The MAYA-II game of tic-tac-toe consists of a three- 
by-three array of wells containing different DNA gates. 
The player makes a move by adding a specific single- 
stranded DNA into each well. The sequence of each 
input DNA strand corresponds to a specific move in 
tic-tac-toe such as “middle left square, first round.” The 
player adds the input DNA strand into all the wells, 
but only one well (the one the player wants to mark) 
contains the DNA gate that is activated by the specific 
sequence of that input DNA strand, releasing a signal 
that represents one move in that round of game play. 
The hairpin DNA in only one gate responds to the 
input DNA strands by dramatically changing its hair- 
pin structure, triggering a fluorescent signal (see Figure 
13.26). The moves made by the computer are encoded 
in the DNA sequence of each logic gate (well). The 
players are identified by fluorescence at different fre- 
quencies, so that the game moves made by the human 
and computer players are easily distinguished by color. 

DNA computers have future applications in novel 
biomedical technologies and in manufacturing and 
will be essential for the assembly of new nano-sized 
materials and nanomachines. For example, research- 
ers have used DNA strands to build a nanorobot sys- 
tem containing dozens of flipper-like molecular arms. 
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FIGURE 13.26 The new age of DNA, the DNA computer. The newest version of the DNA computer can perform the logic operations usu- 
ally carried out by a silicon-based computer. (A) The MAYA DNA computer can play tic-tac-toe. (B) This single-stranded DNA molecule rep- 
resents a logical formula with the conflicting assignment b and !b (not b). The regions of the single-stranded DNA that are complementary 
to each other in the same DNA molecule base pair to each other to make short stable double-stranded regions of DNA (stem) connecting to 


single-stranded loop regions. 


This machine relies on nothing more than the specifi- 
city of the hydrogen bonds involved in the zipping and 
unzipping of complementary DNA bases in each gate. 
In the new DNA computer circuits, these DNA base- 
pairing interactions perform logic operations such as 
the “if A and C, then D” (called an A AND C gate), 
which is usually performed in an electronic computer 
in a solid-state circuit representing A, C, and D. 


SUMMARY 


Much of the research in pharmaceutical biotechnology 
focuses on the overall goal of improving drug develop- 
ment and improving disease treatment. Success in drug 
development research requires a detailed understand- 
ing of the biological targets of drugs in the cells. Drug 
design also relies on human genome sequence informa- 
tion and a real appreciation for the intricacies of protein 
structures and the protein changes responsible for differ- 
ent diseases and disorders. One approach is to design 
a drug that can mimic the structure of a protein that 
normally binds to a key protein in the disease pathway. 
Once the imposter binds to the key protein, that protein 
can no longer participate in causing the disease. This 
result would indicate that the imposter compound might 
make an effective drug to treat the disease in question. 
The genomics and proteomics fields both generate 
important resources for researchers in pharmaceutical 
biotechnology, because research in these fields focuses 
on understanding the structure and function of genes 
and exploring how the gene products interact with 
each other and with environmental factors. Scientists 
use microarray technologies to find out which genes 


are expressed in the healthy compared to diseased 
cells or to determine how a particular drug affects the 
expression of large numbers of genes. These questions 
are central to pharmaceutical biotechnology research- 
ers because genes and proteins have a central role in 
human disease processes and in drug treatments. 

Like many areas of science, pharmaceutical biotech- 
nology continues to benefit from the amazing and intrigu- 
ing new field of nanotechnology, which created carbon 
nanoparticles to provide new ways to deliver drugs to 
specific targets in the body, DNA nanotubes that act as 
biosensors in the body, and nanohorn structures that offer 
a novel way to store hydrogen. Recent hybrid nanostruc- 
tures like the nanobud now provide ways to construct 
hybrid nanomachines. The amazing characteristics of 
carbon nanotubes include an extremely strong material 
vthat is lightweight, and often exhibits unexpected nano- 
characteristics such as superconductivity. The novel nano- 
scale properties of these tiny structures have the potential 
to produce a wide variety of useful nanoscale products 
with many potential applications in medicine, including 
nanomachines that will work inside living cells. 

Lab on a chip technology has already produced the 
first handheld bioanalyzer devices that will change the 
future of standard laboratory testing. Future devices will 
include special units available for DNA testing at home, 
and will improve forensic and other testing technologies. 

DNA computers are still very new, but the bio- 
medical applications of DNA computers have a bright 
future. The development of DNA nanocomputers is 
under way to produce DNA computers that can func- 
tion inside living human cells. In the future the col- 
laboration of nanotechnology, DNA computers, and 
lab on a chip technology will produce tiny implanted 
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devices that monitor and control the levels of insulin 
and sugar in diabetic patients, patrol the body for path- 
ogens or cancer cells, or provide the appropriate doses 
of analgesic drugs for most effective pain control. 

The information available in an individual's DNA 
genome is rapidly becoming an important tool to improve 
diagnosis and treatment, and it is the foundation for the 
practice of personalized medicine. This field is very new 
but it will continue to grow under the influence of more 
biotechnology companies that offer the public access to 
their genes, genotypes, and possible genetic future. The 
structural approach to designing new drugs is based on 
understanding the molecular structures of the biologi- 
cal drug targets inside the cell, and the characteristics of 
the active site on the target enzyme. Some types of new 
drugs are specifically designed to stop key interactions 
between a protein enzyme and its target. The fields of 
genomics, proteomics, personalized medicine, drug dis- 
covery, lab on a chip technology, nanotechnology, and 
nanomedicine, are among the areas making important 
contributions to the growing field of pharmaceutical 
biotechnology. Although diverse, most of these scien- 
tific specialties focus on research to find and develop 
new drugs and to design more effective drug treat- 
ments. Researchers in the field of pharmaceutical bio- 
technology are working on developing new and better 
medicines and therapies to treat human diseases. In the 
future, pharmaceutical biotechnology will continue to 
have a significant impact on related research in person- 
alized medicine, targeted drugs, and nanoparticle-based 
delivery systems for anticancer drugs. 


REVIEW 


This chapter describes the field of pharmaceutical bio- 
technology and updates the status of scientific fields that 
contribute to drug development and disease treatments, 
including genomics, proteomics, personalized medicine, 
drug discovery, lab-on-a-chip technology, nanotechnol- 
ogy, and nanomedicine. To assess your understanding of 
these areas, answer the following review questions: 


1. What is the main focus of research in pharmaceu- 
tical biotechnology? 

2. Describe the impact of the human genome 
sequence and genomics on advances in the sci- 
ence of personalized medicine. 

3. Explain the role of protein structure in designing 
new drugs to treat a disease. 

4. Explain how changes in protein expression can be 
used to identify an early diagnosis biomarker for a 
disease. 

5. Describe how microarray and cDNA technologies 
are used to study differential gene expression in cells 
and to analyze the genetic results of drug treatments. 
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6. Give an example of the new physical or chemical 
characteristics that are exhibited by nanostructures 
such as nanotubes. 

7. Explain which features can be visualized by the 
new 4D electron microscope that cannot be 
observed using typical electron microscopy. 

8. Describe the important role that 3D protein struc- 
ture plays in the strategy used to design new drugs. 

9. Explain the relationship between a single nucleotide 
polymorphism (SNP) in an individual’s genome and 
the chance of inheriting a genetic disease. 

10. Explain how the use of handheld lab-on-a-chip 
devices will change routine patient health care in 
the future, or be an advantage for forensic scientists. 
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Feline Fluorescence 


Newsday, December15, 2007 

By Cho Jin-seo 

Staff Reporter, Korea Times 

South Korean scientists have cloned cats that glow 
red when exposed to ultraviolet rays, an achievement that 
could help develop cures for human genetic diseases, 
the Science and Technology Ministry said. Three Turkish 
Angora cats were born in January and February through 
cloning with a gene that produces a red fluorescent pro- 
tein that makes them glow in the dark. One died at birth, 
but the two others survived, the ministry said. The minis- 
try claimed it was the first time cats with modified genes 
have been cloned. Scientists from Gyeongsang National 
University and Sunchon National University took skin cells 
from a cat and inserted the fluorescent gene into them 
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before transplanting the genetically modified cells into 
eggs (Figure 14.1). 


Fluorescent feline face! The cloned Turkish 
Angola kitten (left), gives off a red fluorescent glow, whereas 
an ordinary kitten (right) appears green because this picture 
was taken under ultraviolet light. The genome of the cloned 
Angola kitten contains a gene that was modified to produce a 
red fluorescent protein. 


FIGURE 14.1 


LOOKING AHEAD 


This chapter describes the science behind animal bio- 
technology, which encompasses the strategies and 
methods used to change the genetic makeup of living 
animals, including humans. A major goal of this work 
is to provide animal test models to better understand 
and treat human diseases. In addition, these methods 
can be used to create animals to produce medically 
useful proteins and to develop appropriate animals to 
provide transplantable organs such as kidney, heart, or 
lung for human recipients. 

On completing the chapter, you should be able to 
do the following: 


e Describe the major methods used to create trans- 
genic animals. 

e Explain how a functioning gene can be removed 
from an animal to create a “knockout” organism. 


311 


Chapter 14 } 


312 


e Describe the use of animals carrying tissue-specific 
gene knockouts to understand the role of different 
genes in human development. 

e Explain why scientists might decide to incorporate 
a transgene or a knockout gene into the germ line 
cells of an animal, and not the somatic cells. 

e Identify five ways in which transgenic, knockout, 
and conditional knockout animals contribute to 
understanding human diseases. 

e Explain how transgenic genes are expressed in the 
host transgenic animals created to make useful 
products, including protein drugs. 

e Describe the ethical arguments for and against 
the xenotransplantation of animal organs to treat 
human patients. 

e Explain the steps used in reproductive and thera- 
peutic cloning and distinguish between the goals 
and purposes of each approach. 

e Describe the concerns raised about animal biotech- 
nology; in particular describe the potential prob- 
lems associated with recombinant DNA products 
that may enter the food chain. 


INTRODUCTION 


The first efforts to alter the genetic information of an 
animal to make the animal more useful for biomedi- 
cal research began long ago with the most common 
laboratory animal, the white mouse. Years of research 
in molecular genetics revealed that the fundamental 
rules governing gene expression and protein synthesis 
in bacteria are very similar to the rules governing the 
same processes in animal cells but with some impor- 
tant differences. In early experiments, when research- 
ers inserted a ribosomal RNA gene from frogs into 
bacterial cells, they were surprised to find that the frog 
gene was expressed and the frog ribosomal RNAs were 
produced in the bacteria. 

The possibility that scientists could move genes 
from one organism into another organism, and from 
one species into another species, became the founda- 
tion for the modern biotechnology research. This work 
is aimed at studying the functions of human genes 
in animal cells with the goal of producing therapeu- 
tic proteins and other useful biological products as a 
result of inserting foreign genes into the genomes of 
appropriate animals. In 1981, a rabbit B-globin gene, 
which encodes one of the major proteins in red blood 
cells, was introduced into mouse egg cells. The result- 
ing offspring were mice that produced the rabbit 8- 
globin protein in their red blood cells. In a different 
experiment, the rat human growth hormone gene was 
inserted into mouse oocytes and the offspring grew 
into especially large adult mice (Figure 14.2). 
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FIGURE 14.2 Rat growth factor hormone influences the body size 
of mice and rats. The transgenic mouse on the left has an active rat 
growth hormone gene inserted into its genome and it is twice the 
size of the normal mouse on the right. 


Modern animal biotechnology focuses on the creation of 
genetically engineered animals that will aid in the under- 
standing and treatment of human diseases and will make 
transgenic animals that produce many proteins, drugs, and 
other products to benefit humans. 


TRANSGENIC ANIMALS ARE GENETICALLY 
ALTERED 


Transgenic animals are constructed by inserting one 
or more foreign genes into the genome of the recipi- 
ent organism, including bacteria, plants, and animals. 
Vectors are commonly used to carry a foreign gene 
into the genome of animal cells. The foreign gene car- 
ried by a transgenic animal is a transgene. Small trans- 
genic animals such as laboratory mice were initially 
constructed for research studies. The first successful 
large transgenic animals were created in1985, when 
transgenic rabbits, pigs, and sheep were produced car- 
rying the gene for human growth hormone (HGH). The 
HGH gene DNA was integrated into the chromosomal 
DNA in all three species and the HGH protein was 
successfully produced in transgenic rabbits and pigs. 
Since then, transgenic cattle expressing human growth 
hormone have been produced. 


Changing Genes in an Animal Genome 


The first challenge to overcome when creating a trans- 
genic animal is to determine the best strategy to safely 
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FIGURE 14.3 How to make transgenic animals. (A) Method 1: The mouse embryonic stem cells are first treated with the vector DNA 
carrying the transgene. Most cells treated with the DNA do not pick up the DNA and as a result most of the cells in the experiment are 
killed when they are exposed to the G418 antibiotic. A few cells pick up the DNA vector and the tk gene is inserted incorrectly into the 
genome. These cells are resistant to G418 but are killed by ganciclovir. A very few cells incorporate the transgene and the vector DNA 
inserted correctly into the genome. Homologous recombination involves the exchange of vector DNA with similar (homologous) 
DNA sequences in the target animal genome. DNA exchange between homologous regions causes the transgene to be inserted accu- 
rately into the target site in the animal genome. The desired population of cells will grow in culture medium containing the antibi- 
otic drugs, G418 and ganciclovir and the cells that survive the selection process become enriched several thousand fold. (B) Method 2: 
The earliest transgenic animals were created by microinjecting the transgene DNA into one of the two pronuclei in the fertilized oocyte, these 
are the egg and the sperm nuclei before they fuse. Under a microscope, the scientist uses a glass pipette with an ultrathin tip to deliver the 
DNA into one pronucleus while the egg cell is held in place with suction from a blunt pipette. The resulting transgenic eggs are introduced 
into surrogate females to allow the transgenic embryo to grow and develop. 


introduced into surrogate females to allow the transgenic 
embryo to grow and develop. 


change the genome DNA in every cell in the entire 
animal. This involves manipulating the germ line cells 
and embryonic cells of the animals to be altered. 
Scientists must know the location, structure, and func- 
tion of the gene in the animal genome that is targeted 
for change. It is extremely important to use an appro- 
priate DNA vector that will not only to carry the trans- 
gene into the cell, but will also dictate the fate of the 
transgene DNA once it enters the nucleus. 


A transgenic animal is genetically engineered to carry an 
additional gene (or genes) in the genome of all of its cells, 
but the transgene protein product might not be expressed 
in all of the cells. 


The first transgenic animals were created by microin- 
jecting the transgene DNA into one of the two pronuclei 
in the oocyte, which are the egg and the sperm nuclei 
before they fuse (Figure 14.3). Using a microscope to 
visualize the process of microinjection, the scientist 
uses a glass pipette with an ultrathin tip to deliver a tiny 
amount of solution containing DNA into one pronucleus 
while the egg cell is held in place with suction from a 
blunt pipette. The resulting transgenic embryos are then 


Embryonic stem cells (ESCs) were first isolated from 
mice and cultured in the lab in 1981 by Martin Evans 
and Mathew Kaufman of Cambridge University (see 
Chapter 12). Embryonic stem cells are undifferentiated 
cells that grow inside the hollow blastocyst embryo 
(see Chapter 12). The cells on the outside of the hollow 
blastocyst will eventually develop into the placenta, 
which physically connects to the embryo to the mother 
through the umbilical cord. In comparison, the ESCs 
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growing inside the blastocyst embryo have the abil- 
ity to develop into all of the different types of cells in 
the human body. This includes germ line cells, which 
produce eggs in the females and sperm in males. The 
undifferentiated ESCs are pluripotent because they 
have the ability to develop into all of the types of cells 
in the human body (Chapter 12). 

Major progress was made in the approach to cre- 
ating transgenic animals when scientists began to 
use genetically altered ESCs to generate transgenic 
embryos instead of relying on microinjecting DNA into 
pronuclei (see Figure 14.3). In this new method, the 
ESCs are removed from inside a blastocyst and grown 
in the lab and then treated with the vector DNA carry- 
ing the transgene. This vector also contains other genes 
and functional DNA elements including a gene that is 
expressed only if the vector integrates properly into the 
cell genome. Vectors typically carry antibiotic-resist- 
ance genes that allow only the cells carrying the vec- 
tor to grow in the presence of the antibiotic. 

The scientists then inject the treated ESCs carrying 
the vector/ transgene DNA into the inner cell mass 
in blastocyst embryos of the same species. The trans- 
genic blastocyst embryos are then transferred into 
the uterus of a surrogate female to carry the develop- 
ing embryos until birth. Transgenic animals created in 
this way acquire only one copy of the transgene per 
cell because the transgene was introduced into only 
one chromosome in the cell. However, the offspring 
created by mating two transgenic animals produces 
transgenic offspring that have two copies of the 
transgene. 


Scientists created the first transgenic animals by injecting 
the altered gene DNA into the pronuclei immediately after 
fertilization. Later they used embryonic stem cells to gen- 
erate lines of transgenic animals for scientific research and 
the development of biomedical treatments. 


Specialized vectors play a significant role in cre- 
ating custom designed transgenic animals in addi- 
tion to controlling the delivery and expression of the 
transferred transgenes. The vectors are engineered not 
only to carry the transgene DNA into the cell, and into 
the nucleus, but also to direct the fate of the vector 
DNA and the transgene inside the recipient cell. The 
functional DNA elements included on the vector are 
important components of the genetic selection strat- 
egy devised to select for the ESCs carrying the desired 
genetic alteration(s). This includes whether or not the 
vector and transgene integrate into the genome and 
controls the timing of the expression of the transgene 
product. 
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Transgenic vectors used to produce transgenic 
animals often contain the neomycin resistance gene 
(neo') and thymidine kinase gene (tk) as well as the 
transgene and other DNA elements (Figure 14.4). The 
neomycin resistance gene codes for an enzyme that 
inactivates the antibiotics neomycin and G418. The 
thymidine kinase gene codes for an enzyme that phos- 
phorylates the nucleoside analog, ganciclovir. During 
DNA replication, the phosphorylated ganciclovir is 
mistakenly incorporated into the replicating DNA, 
causing the cells expressing the thymidine kinase gene 
to be killed by the ganciclovir. 


Targeted Integration of Transgene DNA 


Certain DNA elements carried on the vector play impor- 
tant roles determining the fate of the vector DNA once 
it reaches the nucleus of the recipient cell. Some vec- 
tors are designed to integrate directly into the chromo- 
some DNA, sometimes at random sites in the genome 
and sometimes at specific sites in the genome DNA, 
depending on the type of vector. In a few cells the vec- 
tor DNA is inserted into the target site in the genome 
using the rare process of homologous recombination. 
This occurs because the vector contains DNA regions 
that are similar (homologous) in DNA sequence to the 
targeted regions of the animal genome (Figure 14.4). 
The exchange of DNA sequences between homologous 
DNA regions in the vector and in the genome causes 
the transgene DNA to be inserted into a known target 
site in the recipient genome. 

There are some drawbacks of integrating the trans- 
gene into an animal genome. For example, there 
would be negative consequences if the vector inte- 
grated into the coding region of a gene and disrupted 
expression of an essential protein. This would be a rare 
event, however, because the human chromosomes are 
composed mostly of noncoding DNA sequences, with 
very little coding DNA (see Chapter 6). It is more likely 
that the expression of a transgene might be turned off 
or suppressed if the transgene DNA is inserted into 
regions of the chromosomes where genes are known 
to be permanently inactivated and are not expressed. 

Vectors that are packaged into virus particles are 
particularly efficient at transfecting cells because these 
vectors mimic the processes normally used by infec- 
tious viruses. Most viral vectors have limitations on the 
length of the DNA that can be inserted into the vec- 
tors, usually because of the constraints imposed by 
packaging the vector into the viral capsid. To overcome 
this restriction, scientists constructed artificial chro- 
mosome vectors derived from native budding yeast 
chromosomes, which have no limit on insert size. The 
P1 plasmid DNA is used to make bacterial artificial 
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chromosome vectors (BAC), which can accommodate 
an insert of about 300,000 base pairs (300 kb). These 
vectors also contain the additional DNA sequences 
needed to regulate transcription of the vector genes 
including the transgene. Progress has been made to 
ensure that the transgene will show accurate tissue- 
specific gene expression, an important feature of trans- 
genic gene expression. 


Specialized DNA vectors play an essential role in inserting 
genes into specific target sites in the genome by homolo- 
gous recombination. This natural process involves exchang- 
ing DNA on one chromosome for the equivalent region of 
DNA on the homologous chromosome. 


Transgenic Knockout Animals 


A knockout gene refers to any gene that has been inac- 
tivated or deleted from a genome. A knockout animal 
is a specific type of transgenic animal that has had a 
specific gene inactivated or deleted from its genome 
These gene knockout animals have been essential for 
a variety of studies designed to determine the role of 
specific genes in cell growth and development in 
animals. Many genes in diverse organisms such as 
humans and mice encode similar proteins. Scientists 
can investigate the functions of the human genes and 
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proteins by expressing the human protein in different 
organisms. 

Transgenic knockout animals are produced using 
special vectors engineered to carry out targeted inte- 
gration of the vector DNA into a specific location in 
the genome, instead of inserting into random genome 
locations. The vector carries the mutant (knockout) 
version of the gene to be deleted, and a gene that 
encodes resistance to antibiotics such as neomycin. 
In the vector, these two genes are flanked on both 
sides by regions of DNA that are similar (homolo- 
gous) to the sequences of the DNA regions that flank 
the target gene in the genome. The regions of similar 
DNA sequences in the vector and the genome direct 
the process of recombination (crossing-over) so that 
the vector DNA is inserted into the target site in the 
genome. In the case of a knockout gene, the vector 
DNA is inserted in such a way that the wildtype gene 
is removed from the genome and replaced with the 
thymidine kinase (tk) gene, which confers cell sensitiv- 
ity to the antibiotic ganciclovir (see Figure 14.4). 

The same approach was used to produce transgenic 
knockout mice, with each knockout strain lacking a 
particular mouse gene. After the vector has been intro- 
duced into the mouse embryonic stem cells (ESCs), the 
cells were grown in the presence of antibiotics (neo- 
mycin and ganciclovir), which simultaneously imposes 
positive and negative selections on the growth of the 
cells. The neomycin resistant cells continue to grow in 
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FIGURE 14.4 Transgenic vectors are used to generate transgenic animals. The neo" gene encodes an enzyme that inactivates neomycin and 
G418. The tk gene encodes thymidine kinase, an enzyme that phosphorylates the analog ganciclovir, which is incorporated into newly replicated 
DNA; as a result, ganciclovir kills cells expressing the tk gene. Shaded sequences are regions of DNA homology (have similar DNA sequences). 
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the presence of neomycin, whereas those cells killed 
by the ganciclovir antibiotic lack the tk gene and rep- 
resent unwanted transformants. The potential ES knock- 
out cells that continue to grow in culture are injected 
into three- to five-day-old mouse blastocyst embryos, 
which are then transferred into the uterus of a surro- 
gate mouse. The knockout mice generated by this proc- 
ess are heterozygous for the knockout gene because 
the knockout vector integrates into only one of the two 
chromosomes carrying the target gene. The knockout 
offspring are then bred to produce second-generation 
transgenic animals with homozygous knockout genes. 
However, if the gene that is inactivated by the knock- 
out mutation is normally essential for survival, the 
heterozygous mice will survive but the homozygous 
knockout alleles will be lethal. 


Specialized DNA vectors are used to construct transgenic 
knockout animals that have had a selected gene(s) com- 
pletely deleted from their genome DNA. The knockout 
vector inserts into only one of the two chromosomes, mak- 
ing the first generation mice heterozygous for the knock- 
out allele (wt/ko). With appropriate breeding, the second 
generation mice can be homozygous for the knockout 
mutation and carry knockout alleles on both chromosomes 
(ko/ko). 


Controlling Transgene Expression in 
Transgenic Animals 


In 2006 the Nobel Prize in medicine was awarded to 
Andrew Fire of Stanford University and Craig Mello of 
University of Massachusetts Medical School for dis- 
covering RNA interference (RNAi). This novel biologi- 
cal mechanism involves small RNA molecules (siRNAs) 
that can turn off the expression of selected genes (see 
Chapter 11). This siRNA technology provides scientists 
with an extremely powerful tool to explore the functions 
of specific proteins in the cell, and potentially be used to 
abolish the function of disease genes in patients. 

Fire and Mello first identified the short hair- 
pin shaped RNA molecules as interfering with gene 
expression in the roundworm C. elegans. They deter- 
mined that the small RNAs were part of a mechanism 
to protect the cells against invading RNA viruses. The 
discovery of RNAi opened up a new field of research 
that identified the many components involved in this 
complex protection mechanism. Perhaps even more 
important however was the rapid development of 
RNAi technologies, which were applied to a very wide 
range of scientific fields and experimental approaches. 
Although discovered in worms, scientists have used 
RNAi technology to control gene expression in many 
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organisms and during embryo development (Figure 
14.5) (see Chapter 11). 

The naturally occurring Cre-loxP DNA recombina- 
tion system was adapted as a molecular tool used to 
remove specific genes from a genome (Figure 14.6). 
The Cre gene codes for the Cre protein, a site-spe- 
cific DNA recombination enzyme (recombinase) that 
acts only on DNA containing a specific 34 base pair 
sequence called loxP (locus of X-over P) sites. The 
transgene DNA is inserted into the vector so that the 
gene is flanked by two loxP DNA sequences. The Cre 
enzyme stimulates recombination between the loxP 
sites in the vector and the loxP sites in the genome, 
causing the DNA region between the two loxP sites to 
be deleted from the animal genome. 

Transgenic animals engineered using the Cre-loxP 
DNA recombination system have been used in tissue- 
specific gene expression studies to analyze genes and 
proteins that are expressed only in certain tissues. In 
these experiments a set of transgenic animals is con- 
structed containing the transgene of interest in the 
genome flanked on both sides by loxP DNA sequences. 
Some of these transgenic animals also carry the Cre 
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FIGURE 14.5 Gene expression controlled using siRNA (RNAi). 
Expression of selected transgenes can be silenced in live cells 
using siRNA technology. (1) Scientists make their own siRNAs that 
are complementary to the mRNAs they have targeted for destruc- 
tion in the experiment. The custom-designed siRNAs are added to 
the cells (dotted arrow). (2) In the cell, the siRNA binds to special 
cellular proteins and forms RISC complexes. (3) RISC-siRNA com- 
plex searches for mRNAs that can base pair with the siRNA. (4) 
The siRNA/mRNA hybrid induces enzyme activity, which destroys 
the target mRNA and prevents it from being translated into protein. 
(same figure as fig 11.25 (OK to do that??) 
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gene inserted at a different site in the genome under the 
control of a tissue-specific promoter. The Cre enzyme is 
expressed in only one type of cell in the body (green 
in Figure 14.6). A knockout mutation can be created 
in the genome by expressing the Cre enzyme in cells 
carrying a knockout vector when the tissue-specific Cre 
promoter is active. The targeted gene will be deleted 
and the knockout mutation created when recombina- 
tion occurs between the flanking loxP DNA sites (see 
Figure 14.6). The Cre-loxP system permits researchers to 
observe what happens when the targeted gene is deleted 
only from the genomes of specific tissues in the organ- 
ism, revealing the roles of certain genes and proteins in 
tissue-specific cellular processes. In order for the genetic 
changes engineered in the genome by the Cre-loxP sys- 
tem to be inherited by the offspring, the knockout muta- 
tion must be introduced into the genomes of the germ 
line cells, which give rise to egg cells in females and to 
sperm cells in males. 


The Cre-loxP recombination system has the advantage that 
it can be used to create transgenic animals carrying con- 
trollable knockout genes. This permits scientists to induce 
a knockout mutation and analyze the consequences of los- 
ing a key protein in many different circumstances. 
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In the years since the first transgenic mice were 
developed, researchers have created tens of thousands 
of different strains of transgenic and knockout mice for 
use as research tools to study specific genes in human 
development, health and disease. The OncoMouse, for 
example, is a strain of mice engineered with a specific 
transgene that makes the mice especially susceptible 
to tumors, providing an excellent model system to test 
the effectiveness of potential anticancer drugs. The 
OncoMouse was developed at Harvard University by 
Philip Leder and Timothy Stewart and received the first 
animal patent ever issued in the United States. 

Pharmaceutical and biotechnology companies have 
developed thousands of transgenic and knockout mice 
strains for the highly competitive and profitable race to 
discover new drugs. A good example is the company 
Deltagen, which has generated a collection of more 
than 500 different knockout mouse strains. Each strain 
is missing a different gene that codes for a different 
enzyme or protein and can be licensed by the com- 
pany as a tool used to test new drugs. 

Studies on the G-protein-coupled receptors 
(GPCRs) are a good example of the use of knockout 
transgenes applied to members of a large family of 
important membrane proteins that activate the signal 
transduction pathways that control various cellular 
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FIGURE 14.6 Cre-loxP can be used to delete genes from animal genomes. (A) The Cre gene (red) codes for the Cre protein, a site-specific 
DNA recombination enzyme that recognizes and acts on any DNA molecule containing the 34 base pair loxP (locus of X-over P). (B) When 
the Cre recombinase enzyme is expressed in mammalian cells (green promoter) it catalyzes recombination exchange between the loxP sites 
(pink) in the vector and the loxP sites in the genome, causing the DNA (target gene in blue) between those two loxP sites in the genome to 
be deleted. (C) A closeup of the DNA recombination event that releases the target gene from the genome. 
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responses. GPCRs are involved in cellular responses 
to hormones, pheromones, sensitivities to light and 
odor, and the transmission of nerve signals. In differ- 
ent eukaryotic organisms, GPCRs are also called seven 
trans-membrane domain receptors, hepta-helical 
receptors, serpentine receptors, and G protein-linked 
receptors (GPLR) (Figure 14.7). The GPCR research- 
ers created a collection of 236 mouse strains, with 
each strain carrying a different knockout mutation that 


FIGURE 14.7 The seven transmembrane a-helix structure of a 
G-protein-coupled receptor. 
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removes a different G-protein-coupled receptor protein 
gene from that mouse genome DNA. 

Many of the commonly used drugs elicit a cel- 
lular response through a biochemical pathway that 
involves GPCR proteins. For this reason, transgenic 
GPCR knockout mice have been developed for use 
in many strategies designed to search for new drugs. 
Initial searches for new drugs involved cells grown in 
the laboratory used to identify compounds that are 
predicted to work by blocking the function of a spe- 
cific GPCR protein. In subsequent experiments the 
GPCR knockout animals were treated with a hormone 
stimulus, which caused measurable effects on the ani- 
mals tested. The next test for the knockout mice was 
designed to find out if certain compounds block the 
effect of the initial hormone-induced stimulus. 


Transgenic animals play a key role in drug design and 
development by pharmaceutical companies. The biochem- 
ical pathways involved in a disease can often be blocked 
in knockout animals, giving researchers clues to types of 
new drugs that will mimic the knockout allele and allevi- 
ate disease symptoms. 


In 2007, Tabuchi and colleagues (University of Texas 
Southwestern Medical Center) reported studies on transgenic 
mice with mutations in the neuroligin gene, which encodes 
cell-adhesion molecules in the nervous system. One set of 
knockout mice have had one copy of the neuroligin gene 
deleted from the genome. The second set of transgenic mice 
have a single point mutation in the neuroligin gene (R451C), 
which changes a cysteine amino acid to an arginine amino 
acid in the neuroligin-3 protein. The same R451C neuro- 
ligin gene mutation was also found in the genomes of some 


people with autism, a spectrum of disorders that cause sig- 
nificant impairment of mental function and social behavior. 
The neuroligin R451C mutant mice showed impaired social 
interactions and they exhibited an increase in inhibitory 
nerve transmissions (Figure 14.8). This study suggests that 
the changes in synaptic nerve transmission might contribute 
to autism in humans. The R451C mice will continue to be a 
useful model for studying autism behavior and possible treat- 
ments in mammals. 


FIGURE 14.8 Transgenic mice offer a new model for studying autism. Normal mice are interested in socializing with other mice (left), 
but the neuroligin knockout mice ignore the other mice and behave as loners (right). 
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PRODUCTS FROM TRANSGENIC ANIMALS: 
BIOPHARMING 


Transgenic animals are used routinely used for research 
in many agricultural labs and for production by bio- 
technology companies. Agricultural and biopharma- 
ceutical scientists envision a future with transgenic 
livestock as the major sources of many therapeutic pro- 
teins, and as an approach to generate livestock animals 
that are resistant to economically important diseases. 

The development and marketing of transgenic ani- 
mals and plants that produce therapeutic proteins and 
commercial products is often referred to as pharming 
or biopharming. First scientists cloned and expressed 
recombinant proteins in bacterial cells like E. coli. 
Then scientists discovered that it is essential for many 
recombinant eukaryotic proteins to be expressed in 
mammalian cells to ensure that the transgenic recom- 
binant proteins will be correctly posttranslationally 
modified by the addition of the appropriate carbohy- 
drates (sugars) and other chemical groups. Mammalian 
cells also provide the appropriate environment for the 
chains of amino acids to fold properly after synthesis 
and form the 3D protein structures required so that the 
protein can function correctly in the cells. 

A number of research groups have produced stable 
lines of transgenic livestock including cows, sheep, 
and goats, which secrete the transgene products in the 
milk. In these transgenic animals the expression of the 
transgene was under the control of the transcriptional 
promoter for the casein gene that normally drives 
the expression of proteins that are secreted into milk. 
The industrial scale of transgenic livestock manage- 
ment and the development of methods to purify the 
desired transgene proteins presented many challenges. 
However, since the late 1990s a number of companies 
have established commercially viable transgenic pro- 
tein production programs using goats, cattle, pigs, rab- 
bits, and chickens. 

The medical applications of transgenic animal 
products have been slow to be realized. The only 
transgenic protein made in animals that has been 
approved for therapeutic use in humans is ATryn (GTC 
Biotherapeutics), which was given market authoriza- 
tion in August 2006 by the European Commission. In 
2007 ATryn received Orphan Drug designation from 
the Food and Drug Administration (FDA); orphan drugs 
are used to treat rare conditions that affect 200,000 or 
fewer individuals at any one time. ATryn is a recom- 
binant form of the human antithrombin protein, which 
prevents blood clots during surgery on patients with 
an inherited antithrombin disease. These people lack 
an important blood protein that normally inhibits 
blood-clotting. Transgenic animals have been used to 
produce several clotting factor proteins that are easily 
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harvested from milk, but are very difficult to isolate in 
sufficient quantities from human sources.Additional 
biopharming projects that are based on the production 
of specific proteins in transgenic rabbits and chickens 
are still in the early research stages or are currently in 
early clinical trials. Examples include the Granulocyte- 
Colony Stimulating Factor (G-CSF), which was first dis- 
covered because of its ability to stimulate the growth 
and maturation of granulocyte cells, an important 
type of blood cell that fights infection. A recombinant 
form of the Granulocyte—Colony Stimulating Factor 
(r-hG CSF) protein was produced in bacterial cultures 
and approved for use in the United States in 2002. In 
addition, researchers at the French National Center for 
Scientific Research produced recombinant human C- 
CSF (r-hG-CSF) protein secreted into the milk of trans- 
genic goats. A Dutch company, Pharming, has used 
transgenic cows to produce fibrinogen, a key blood- 
clotting protein, which was released into the cow’s 
milk. This company is also planning to produce a pro- 
tein sealant in cow’s milk that can prevent excessive 
bleeding of wounds during surgical procedures or after 
traumatic injury. In 2007 Pharming received the FDA's 
Orphan Drug designation for this product. 

The overall success of the effort to produce drugs 
in transgenic animals remains uncertain. In 2007, the 
Council for Agricultural Science and Technology listed 
14 companies and research groups in North America 
and Europe with projects aimed at producing bio- 
logical products using transgenic livestock. However, 
recently at least two companies had gone out of busi- 
ness, some were purchased by larger companies, and 
others had abandoned the research projects that rely 
on transgenic animal technology. Research groups and 
biotechnology companies face additional challenges 
to overcome for the transgenic industry to continue, 
including the need to establish disease-free transgenic 
animal herds that stably produce proteins. In addi- 
tion, better and more cost-effective protein purifica- 
tion methods are needed, which meet the regulatory 
requirements of the FDA and the comparable agencies 
in Europe, the United Kingdom, and other developed 
countries. 

Despite these challenges, the potential offered by 
transgenic biopharming is substantial. Using trans- 
genic cows, GTC Biotherapeutics produced 2 grams 
of recombinant human antithrombin (AT) protein in 
every liter of milk. The AT protein was purified from 
the milk with a loss of only 50%, and the AT protein 
produced by cows was found to function as well as 
the antithrombin protein made by human cells. The 
recombinant proteins produced in bacterial, yeast, or 
mammalian cell cultures must meet high standards for 
purification that remove viruses and dangerous bacte- 
ria. A big concern is the possibility that the transgenic 
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FIGURE 14.9 Mastitus is a common infection in cows caused by 
dangerous streptococci bacteria. 


proteins made in animals might become contaminated 
with the protein-based agent that causes scrapie, a 
lethal neurological disease or the agent responsible for 
Mad Cow disease. 

In the late 1990s, the increased interest in the use 
of transgenic animals to produce biopharmaceuticals 
was driven by the growth of the biotechnology industry 
and the fact that a facility to produce therapeutic pro- 
teins using laboratory cell cultures can cost more than 
$500 million. In comparison, transgenic dairy goats 
are much more cost effective—establishing a herd of 
transgenic goats costs less than a half million dollars. 
Dairy cows can produce 10,000 liters of milk per year 
so that tens of kilograms of transgenic protein can be 
produced by a single transgenic cow. 

Transgenic animals are also used to improve the 
agricultural, commercial, and nutritional value of nat- 
ural animal products. For example, the composition of 
animal milk can be altered to improve the growth and 
survival of offspring, and other approaches are used to 
improve the nutrition of the transgenic animal prod- 
ucts. The fat and cholesterol composition of the meat 
produced by livestock could be manipulated in trans- 
genic animals to improve the impact on the human 
diet. Transgenic animals can also help to achieve better 
resistance to bacterial diseases on the farm. Transgenic 
cows make milk containing lysostaphin, an enzyme 
that kills the dangerous bacteria Staphylococcus 
aureus. These transgenic cows are more resistant 
than normal cows to mastitis, a serious infection in 
cows caused by the S. aureus bacteria (Figure 14.9). 
Scientists are also working to improve the quality of 
other animal products such as hair, wool, and fiber, 
made by transgenic animals. Recently, the protein sub- 
units that make up the strong silk fibers made by spi- 
ders were expressed in goats and the spider proteins 
were secreted in the goat's milk. 


DNA and Biotechnology 


Transgenic animals continue to be used to make 
important discoveries in basic research and in the 
search for new drugs. Genetically altered animals can 
potentially provide cost-effective means to make many 
products with medical applications. Each transgenic 
product must be documented as safe before it can 
be used in humans. As with drugs and recombinant 
proteins produced by cells in culture, the evidence 
supporting safety and efficacy of animal transgene 
products must be developed and evaluated on a case- 
by-case basis. 


Transgenic animals continue to have an important role in 
producing specialized animal products for human use, 
ranging from drugs to fibers. 


Monoclonal Antibodies Make Disease- 
Specific Drugs 


Transgenic animals have had an impact on immunol- 
ogy, an important field that focuses on the study of 
the physical, chemical, and physiological characteris- 
tics of the components of the human immune system 
in healthy and diseased states. Malfunctions of the 
immune system cause serious immunological disor- 
ders such as autoimmune diseases, hypersensitivities, 
immune deficiency, and transplant rejection. 
Antibodies are key proteins in the immune system 
with central roles in the prevention and fighting of dis- 
ease. The human body makes antibodies throughout 
life in response to natural exposure to infectious agents 
in the environment. The immune system is responsible 
for the effectiveness of the vaccines used to immunize 
children and adults. Vaccines are made using a protein 
or part of a protein from an infectious bacteria or virus, 
which triggers the immune system cells to generate 
antibody proteins that recognize specific bacterial or 
viral proteins as antigens (also called immunogens). 
Humans make five types of antibodies known 
as immunoglobulins, IgG, IgA, IgD, IgE, and IgM. 
Antibodies are Y-shaped molecules composed of two 
identical long proteins (Heavy or H chains) and two 
identical short proteins (Light or L chains) (Figure 
14.10). Antibodies function by recognizing and bind- 
ing to specific antigens in a lock-and-key fashion, 
forming extremely tight antigen-antibody complexes 
(see Figure 14.10). Antigens have physical and chemi- 
cal characteristics that illicit an immune response in 
the body. Antibody-antigen complex formation trig- 
gers one of several processes that lead to the destruc- 
tion or inactivation of the antigen (bacteria or virus); 
the antibody binds to the outside of the bacterial cell 
and acts like a “tag,” making the bacteria more easily 
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FIGURE 14.10 Structure of an antibody. (A) Antibodies are Y-shaped molecules composed of 


wo identical proteins (Heavy or H chains) and 


two identical short proteins (Light or L chains). (B) Antigens bind to the antigen binding site on the end of the “arms” of the Y. (C) Antibody 


antigen complexes form to promote a specific immune attack. 


detected by phagocytes and disposed of by phagocyto- 
sis. Antigens can be toxins in the human body, which 
become inactivated when bound by the antibody, 
which acts like an anti-toxin. By the same mechanism, 
a virus (antigen) that is bound to an antibody complex 
can no longer enter the host cells and is no longer 
infectious or dangerous. 

The human immune system produces a complex 
mixture of different antibodies in response to infection 
or vaccination. When the body mounts an immune 
response to an antigen, many different B cell lines are 
activated to produce a mixture of different polyclonal 
antibodies raised against an array of specific antigens. 
Antibodies play an important role in basic research 
as research tools that are used to purify and study 
proteins of interest. Scientists often make antibodies 
(“raise” antibodies) against a specific protein antigen 
and then take advantage of the tight binding interac- 
tion between the antibody and the antigen to study 
the normal function of the antigen protein in the cell. 
They can find out which cells in the animal express 
the protein of interest by using immunoprecipitation 
and protein Western blots (see Chapter 5). Researchers 
also use antibodies to identify other proteins in the cell 
that interact with the protein of interest by co-immu- 
noprecipitation assays that can detect physical binding 
between proteins in the cells. 

Antibodies that bind with high specificity to a target 
molecule (antigen) have the potential to be engineered 


to make powerful drugs. Antibodies raised against the 
virus Coat protein can neutralize a virus or kill a virus- 
infected cell. Scientists developed techniques to iso- 
late and grow specific antibody producing cells in the 
lab, so that each cell line produces only one specific 
monoclonal antibody (mAb), which recognizes and 
binds to only one specific antigen (protein, bacteria, 
virus, etc.) (Figure 14.11). Monospecific monoclonal 
antibodies have been developed as drugs to treat 
cancer. Each monoclonal antibody binds to one spe- 
cific chemical region on an antigen molecule called 
an epitope. To produce monoclonal antibodies, the 
mice are first injected with an immunogen (antigen), 
usually a protein or a part of a protein that will elicit 
antibody production. The antibody-producing cells are 
removed from the mouse spleen and fused with tumor 
cells to make a population of hybridomas, which are 
screened to identify the specific cells that are produc- 
ing the desired antibodies. A single cell is placed into 
liquid growth medium in each well of a multiwell 
plastic culture dish. The cells grow into clones contain- 
ing many identical cells in each well. The medium in 
each well is tested for the presence of antibodies that 
react specifically with the antigen. The cell cultures 
that produce specific monoclonal antibody proteins 
are grown in large cultures and the antibody proteins 
are harvested from the cells and purified. This process 
is a form of cloning cells because a population of the 
genetically identical cell clones is generated from a 
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monoclonal antibodies, the mice are first injected with an immu- 
nogen (antigen), which is a protein or a part of a protein that will 
elicit an immune response. (B) The antibody-producing cells are 
removed from the spleen and fused with tumor cells to make hybri- 
domas. (C) Hybridoma cells are screened to identify cells that are 
producing antibodies. (D) These cells are placed separately into each 
well of a multiwell plastic culture dish. In each well the cells divide 
into clones of cells, and the cell media is tested to detect production 
of antibodies that react specifically with the antigen (immunogen). 
The cell cultures that produce the desired monoclonal antibodies are 
grown in large cultures and the antibodies proteins are harvested and 
purified. These cells are called clones because they are genetically 
identical cells, generated from the single cell in each well. 


single cell. The antibodies made in this way are termed 
“monoclonal antibodies”. 

Monoclonal antibodies have several advantages 
over polyclonal antibodies in research applications, 
and for some medical purposes, but polyclonal anti- 
bodies continue to have an important role. Monoclonal 
antibodies are monospecific so they will bind only to 
the antigen used to raise that particular monoclonal 
antibody. Polyclonal antibodies bind to an array of dif- 
ferent antigens. Monoclonal antibodies form cloned 
cell lines that can be stored and used at any time to 
generate more identical cells making the same mono- 
clonal antibodies. Polyclonal antibodies are made by 
a collection of cells that cannot regenerate and make 
more immune cells when the initial collection of cells 
has been depleted. 


The key feature of antibody proteins is the ability of anti- 
bodies to bind tightly to specific antigens with high spe- 
cificity. Polyclonal antibodies are a mixture of antibodies 
that bind to a range of different proteins and antigens. In 
contrast, each monoclonal antibody binds only to one spe- 
cific antigen. 
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FIGURE 14.12 Humanized antibodies. (A) Schematic picture 
shows a chimeric antibody with the rodent heavy chain and light 
chain /white] attached to the human constant regions /black]. The 
fully humanized antibody contains predominantly human protein 
sequences except in regions required to maintain the specific bind- 
ing properties of the rodent antibody /white]. 


Monoclonal antibodies made in mice have proven to 
be very useful for research applications and diagnostic 
testing, but the application of these antibodies as drug 
treatments for human diseases has been limited. In 1986, 
the first monoclonal antibody drug approved by the 
FDA for use in humans was a mouse antibody (OKT3) 
directed against a protein on the surface of immune cells 
that is involved in organ rejection. Initially the OKT3 
antibody was used effectively to prevent or treat tissue 
rejection in human patients, but repeated treatments 
were not safe because the mouse monoclonal antibody 
stimulated the human body to raise an immune response 
against the mouse antibody proteins. In this response, 
the antibody proteins made by the patient bound to the 
mouse monoclonal antibody proteins, causing the mon- 
oclonal antibodies to be cleared from the blood and 
removed from the body through the liver and spleen. 

One approach to avoid the problem of tissue rejec- 
tion was to engineer the mouse antibody proteins to 
appear more like human antibody proteins. Recombinant 
DNA techniques were used to swap certain parts of the 
mouse heavy and light protein chains for the equiva- 
lent human protein sequences (Figure 14.12) to make 
humanized antibodies. These hybrid proteins consist of 
both mouse and human protein sequences, and pre- 
serve the immunogen-specific binding capacity of the 
mouse antibody. Humanized monoclonal antibodies are 
much less likely to cause an immune reaction in human 
patients and have been approved for human treatments. 
The first humanized monoclonal antibody was rituxi- 
mab, which was raised against a specific protein found 
in lymph node cancer, and was as effective as a clini- 
cal treatment for lymph cancer. Herceptin is a human- 
ized monoclonal antibody that targets the Her2 protein 
located on the surface of many breast cancer tumors. 
Herceptin is an effective breast cancer treatment when 
used together with chemotherapy in patients who have 
Her2 positive (Her2 +) breast cancers (see Chapter 9). 

The mouse protein sequences remaining in the 
structure of the humanized antibodies can still cause 
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an unwanted antibody response. To overcome this 
problem, scientists are working on different ways to 
generate human monoclonal antibodies for treatment 
purposes. In one method the cells that form the human 
immune system in the bone marrow and thymus were 
transplanted into mice that are genetically unable to 
develop their own immune systems. After the human 
immune system cells are established in the mice, the 
mice are immunized with the desired protein and they 
make antibodies that are, in molecular terms, human. 
These antibody-producing cells can be isolated and 
cloned as described previously to produce monoclonal 
antibodies that do not cause tissue or organ rejection 
in humans. 


Monoclonal antibodies are particularly useful because 
they represent a pure population of cloned cells producing 
only one type of antibodies that all bind tightly to the same 
antigen. 


XENOTRANSPLANTATION 


Xenotransplantation involves transplanting organs 
from animals into humans to treat organ failure, an 
approach that is predicted to reduce the waiting 
times and increase survival of human patients who 
need organs. The first successful human kidney trans- 
plant was performed in 1954 and worked because the 
donor and recipient were genetically identical twins. 
A genetic match between donor and recipient is nec- 
essary because certain genetic differences cause the 
recipient’s immune system to attack and kill the for- 
eign transplanted tissue. In 1954 there were no drugs 
to effectively prevent transplant rejection, but in 1983 
the first big advance in immunosuppressive drug ther- 
apy came with the discovery of cyclosporine. Other 
immunosuppressive drugs were released in the fol- 
lowing years, allowing successful transplants involving 
liver, heart, heart-lung, single lung, living donor liver, 
and living donor lungs. 

Currently there are more than 100,000 patients on 
the organ transplant waiting lists that are maintained 
by the United Network for Organ Sharing (UNOS). 
This nonprofit organization administers the Organ 
Procurement and Transplantation Network (OPTN) that 
was established by the United States Congress in 1984. 
Worldwide there are not enough organ donors to meet 
the transplant need, even including living donors who 
can safely donate a kidney, a part of the liver, or a sin- 
gle lung. Although the numbers and percentages vary 
from year to year and from organ to organ, approxi- 
mately 4% of the people waiting for a kidney and 10% 
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FIGURE 14.13 Patients on kidney waiting list and reported deaths 
by year 1996 to 2005. 


of those needing a different organ will die while wait- 
ing for an organ transplant. This problem continues to 
get worse with no solution to the organ shortage in 
sight, while the waiting list of patients in organ failure 
continues to grow, doubling in the 9 years between 
1996 and 2005 (Figure 14.13). 


XenoMouse 


In another approach to avoid transplant rejection, 
the XenoMouse was engineered to produce human 
antibodies using mouse embryonic stem cell tech- 
nology. The XenoMouse was developed to generate 
fully human antibody drugs that can be used to treat 
a wide range of human diseases (Figure 14.14). The 
XenoMouse technology was developed by JT America 
Inc and purchased by the biopharmaceutical company 
Abgenix, Inc., of Fremont, California. Once immu- 
nized, the XenoMouse cells that are producing the 
desired human antibody can be isolated and cloned 
to generate cloned cell lines. An example of a mono- 
clonal antibody drug produced using this approach 
is Panitumumab (marketed by AmGen as Vectibix), 
which specifically binds to the epidermal growth fac- 
tor receptor protein (EGFR) and is effective in treating 
certain cancers including colon cancer. 


Monoclonal antibodies that are partly humanized or have 
entirely human sequences have proven to be extremely 
useful in medical therapies, in diagnostic tests, and as 
highly specific research tools. 


Pigs are the Pick for Human Organs 


Faced with a severe lack of human organ donors, and 
building on the reported success of genetically altered 
transgenic animals, scientists worldwide have been 
developing animals with the goal of developing alter- 
native organ donors. Pigs share organ size, anatomy, 
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FIGURE 14.14 Constructing the XenoMouse. The transgenic XenoMouse (Abgenix, Inc.) was developed to generate fully human antibodies 


for use as drugs to treat a range of human diseases. 


and basic physiology with humans and pigs have been 
genetically altered to reduce the potential for rejec- 
tion of pig organs by the human transplant recipients. 
Unlike human cells, pigs have a specific sugar (1,3- 
alpha-galactose) displayed on the surfaces of their cells 
that will generate the production of antibodies if trans- 
planted into a human patient. An organ grown in a 
pig and transplanted into a human would be attacked 
and destroyed by these antibodies. In the human body, 
antibodies bind to the 1,3-alpha-galactose sugars on 
the pig cells, which triggers the complement cascade 
and eventually kills the foreign pig cells. When trig- 
gered by antibodies, a series of more than 30 comple- 
ment proteins launch a stepwise process that leads to 
the assembly of protein complexes that make holes in 
the cell membrane, killing the pig cells. This process, 
called complement activation, can lead to the death 
of the pig cells transplanted into the human body. 
However, organs transplanted from engineered trans- 
gene pigs carrying the one of three different genes for 
human proteins that inhibit complement activation sur- 
vived for three weeks after transplantation. 

Different strategies have been developed to avoid 
rejection of the organs transplanted from pigs. In one 
approach, knockout pigs were created in which the 
gene coding for the protein that catalyzes the synthesis 
of the 1,3-alpha-galactose sugar was deleted from the 
pig genome. This knockout mutation prevented the pro- 
duction of the 1,3-alpha-galactose sugar in the knock- 
out pig cells, and avoided the rejection mechanism, 
suggesting that these animals might be appropriate 


hosts to grow human organs for human transplanta- 
tion. The biopharmaceutical company Living Cell 
Technologies found that coating the insulin-producing 
pancreatic islet cells from pigs with a complex sugar 
acted to protect the pig cells from attack and rejection 
by the human immune system. 

In a different approach, transgenic pigs were cre- 
ated, which express human proteins that inhibit the 
complement cascade. The complement system con- 
sists of a number of small proteins called zymogens 
or proenzymes, which circulate in the blood as inac- 
tive enzyme precursors. A zymogen is a protein that 
requires a biochemical change for it to become an 
active enzyme. When stimulated by one of several 
trigger signals, protease enzymes in the complement 
cascade cleave specific proteins to release peptide 
cytokines (Greek cyto-, cell, and -kinos, movement). 
Cytokines are a large and diverse family of polypep- 
tide regulators that are made throughout the body, and 
function to amplify a cascade of protein cleavages. This 
massive signal amplification increases the assembly of 
attack complexes used by the complement cascade to 
kill cells. The outcome of the complement response 
is cell lysis, and phagocytosis of the antigens, which 
are engulfed by immune system cells. The immune 
complexes are cleared from the immune system and 
deposited in the spleen and liver. 

Tissue rejection can also be caused by the T-cell 
immune response. Immunosuppressive drugs can 
blunt the T cell-mediated responses, but not without 
a significant risk of drug toxicity. An alternate strategy 
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Box 14.2 Goodbye, Dolly 


It’s So Sad to See You Go 

Dolly is gone, but in the decade since she was cloned in 
1997, many other animals have been cloned successfully. The 
birth and short life of Dolly, the lamb cloned from an adult 
cell, not only sparked worldwide debate about the ethics of 
cloning humans from adult cells, but it also represented the 
answer to a scientific debate that had been going on since 
1938. Embryologists wanted to know more about the process 
of cell development to find out if as cells specialized (differen- 
tiated) into all the different cells of the body, the genetic infor- 
mation in the DNA of these cells had changed irreversibly, so 
that the DNA could no longer provide the genetic information 
needed to develop an entire new organism from the start. The 
scientists replaced the nucleus of an egg cell with the nucleus 
of an adult somatic cell, not a germ line cell that will generate 
a sperm or egg, using a process called somatic cell nuclear 
transfer (SCNT) (see Chapter 12). This process was used suc- 
cessfully in frogs in the 1950s and in fish in 1984 when sci- 
entists cloned frogs from tadpole cells and from adult frog 
intestinal cells, and the fish (carp) were cloned using cultured 
fish kidney cells. SCNT was not successful in mammals until 
1996 when researchers at the Roslin Institute in Scotland 
reported that they had cloned two lambs from embryonic 
sheep cells. Technically, the development and birth of the two 


involves establishing immunological tolerance through 
the use of a pig thymus tissue transplant. Baboons with 
transplanted pig organs and with modified immune 
systems fail to react to the pig organs as foreign tissue, 
allowing the baboons to survive using only the pig kid- 
ney for up to 90 days. 

Despite the contributions of animal research to 
progress made in the area of organ transplantation, 
the barriers to transferring pig organs into humans still 
remain. The risk of infection from the porcine endog- 
enous retrovirus (PERV), an RNA virus that can cause 
immune system damage, is unlikely because tests on 
hundreds of patients who were exposed to pig tissues 
such as pancreatic islet cells, skin, and whole livers 
and spleens for extracorporeal blood perfusion showed 
no evidence of PERV. 


Given all the barriers to the use of pig organs to treat 
human patients by xenotransplantation, it is not surprising 
that research into the potential of pig organs to meet the 
urgent need for transplantable organs has declined. 


ANIMAL CLONING 


Many animals have now been cloned using somatic 
cell nuclear transfer (SCNT) including goats, cattle, 


lambs was a major advance, but questions remained unan- 
swered about the state of the genetic information in the sheep 
somatic cells. 

The birth of Dolly the sheep in February 27, 1997—who 
was cloned from adult sheep skin cells by lan Wilmut and 
colleagues at the Roslin Institute—generated news coverage 
and debate around the world. The media questions generally 
focused on human cloning, especially on the ethics and risks 
of generating a clone from the cells of an adult human. In sub- 
sequent years, the debate about cloning humans focused on 
the low success rate of mammalian cloning and on the high 
rate of fetal death and developmental abnormalities. Dolly 
was the only live lamb born from 29 embryos transferred to 
surrogate mothers, and the 29 were the only embryos that 
were suitable for transfer from 277 nuclear transfers, a suc- 
cess rate of 0.3%. The success rates with fetal or embryo 
nuclei were higher, about 5% to 6%, but the death of lambs 
at around the time of birth was high (12%) regardless of the 
source of the nuclei. Similar low efficiencies and high rates of 
fetal and neonatal deaths are also common in SCNT experi- 
ments conducted using other mammalian species. 

Most people, including many scientists and bioethicists, 
conclude that the high risk makes it unethical to use SCNT in 
human patients. 


mice, pigs, cats, rabbits, horses, and dogs. These ani- 
mals were cloned for various reasons. Some were per- 
sonal, such as wanting to replace a beloved pet, or in 
commercial agriculture, to promote the inheritance 
of desired traits, such as high levels of milk or meat 
production by a champion farm animal, for the devel- 
opment of an animal model for research, the recovery 
of species facing extinction, or to prolong the genetic 
lifetime of a champion racehorse. 


Regulation of Animal Biotechnology 


The regulations that control the development and com- 
merce of animal biotechnology products in the United 
States are based on the 1986 guidelines produced by 
the White House Office of Science and Technology 
Policy (OSTP), incorporating policies from the U.S. 
Department of Agriculture, the Environmental Protection 
Agency, the Food and Drug Administration, and the 
Occupational Safety and Health Agency (OSHA). These 
guidelines direct that the regulation of animal biotech- 
nology should be based on the products made, not on 
the process needed to make the products. 

Regulations that control the use of transgenic ani- 
mals in the United States are based on the Animal 
Welfare Act, which is administered by the U.S. 
Department of Agriculture. The act, first passed in 
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1966 and amended in 1970, 1976, 1985, and 1990, 
established regulations to control the handling and 
treatment of animals by commercial and research facil- 
ities. All facilities must be licensed, and research labo- 
ratories must request approval from the Institutional 
Animal Care and Use Committee (IACUC) to comply 
with all the standards and procedures; the IACUC will 
review all proposed research using animals for compli- 
ance with the rules. Each research protocol is reviewed 
by the committee to determine that the proposed study 
is necessary and that the proposed procedures would 
minimize the pain and discomfort to the animals. The 
Animal and Plant Health Inspection Service (APHIS) 
of the USDA Veterinary Services’ National Center for 
Import and Export regulates the import, export, and 
interstate transport of all animals and animal products 
(e.g., tissues, blood, and semen), including transgenic 
animals altered by genetic engineering. The Food and 
Drug Administration, in keeping with the 1986 OSTP 
guidelines, regulates a genetically altered animal as 
a new drug, because the genetic changes affect the 
structure and function of the transgenic animal. 

In 2003, the FDA and the National Academy of 
Sciences (NAS) evaluated the available scientific evi- 
dence and concluded that the food products derived 
from cloned animals and their offspring are as safe to 
eat as food derived from animals that were not cloned. 
Research showed that healthy adult cloned animals 
were virtually indistinguishable from their conven- 
tional counterparts. In January of 2008 a subsequent 
risk assessment performed by the FDA supported the 
scientific findings that cloned animals are a safe source 
of food. The potential use of cloned animals as food 
sources for general consumption required clearly writ- 
ten labels on food to indicate that the product comes 
from cloned animals or from transgenic plants (see 
Chapter 15). Also included are products from animals 
exposed to biotechnology products. A good example 
is recombinant bovine growth hormone (somatotro- 
pin) (bST), which is present in the milk and dairy prod- 
ucts derived from cows who received treatments with 
this recombinant hormone to increase milk produc- 
tion. Bovine growth hormone is normally made in the 
pituitary gland in cows and other mammals, but many 
people want the recombinant products to be labeled 
for the consumer's information. For the commercial 
production of bovine growth hormone protein, the 
bST gene was isolated from the cow genome, inserted 
into a DNA vector and introduced into bacterial cells 
using recombinant cloning methods (see Chapter 4). 
The engineered bacteria produce large amounts of the 
bovine growth hormone, which is purified for use as a 
drug to treat the cows to increase milk production. 

Approved by the FDA in 1994 after 14 years of 
review, the bovine growth hormone case became the 
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subject of state laws and legal cases in Vermont and 
Illinois. The Ben and Jerry’s Ice Cream company pub- 
lically supported clear labeling as a right-to-know 
issue and probably a marketing issue as well. Vermont 
enacted a state law requiring appropriate labeling. 
However, the U.S. Appeals Court judged that the 
Vermont law requiring the labeling of milk products 
from bST-treated cows was unconstitutional based on a 
First Amendment argument that the plaintiffs would do 
themselves irreparable harm by using product labels 
that failed to distinguish milk derived from bST-treated 
cows from milk derived from non-bST-treated cows. 
The state of Illinois prohibited labeling milk and milk 
products as bST-free despite campaigns by the Pure 
Food Campaign (arguing that consumers had the right 
to know) and the Humane Farming Association (argu- 
ing that mastitis requiring antibiotic treatment is more 
frequent in bST-treated cows). In fact, dairy cows are 
evaluated regularly for infections and sick cows are no 
longer used to produce transgenic milk products. 

The effort to label consumer products as “bST-free” 
could represent a marketing strategy to attract consum- 
ers who are environmentally conscious and choose to 
buy organic foods, as well as people with a gut-level 
fear of biotechnology. The label currently allowed by 
federal law reads “No Artificial Growth Hormone,” 
which is defined as “No significant differences have 
been shown between milk derived from rBST-treated 
and non-rBST treated cows.” 


Much controversy and public debate worldwide surrounds 
the sale of products made from transgenic animals, includ- 
ing basic food staples like milk. The public favors labeling 
these products for the benefit of the consumer. 


Human Therapeutic Cloning 


Human reproductive cloning, the creation of an iden- 
tical genetic copy of an individual human achieved 
by the transfer of a nucleus from an adult cell into 
an “empty” egg cell, is widely viewed by scientists 
and most of the public as dangerous and unethi- 
cal. Although the procedure has been successful in a 
number of different animal species, the process is very 
inefficient, the frequency of live births is generally low, 
and many of the animals born using the procedure 
have developmental challenges. These problems led 
most scientists, including lan Wilmut, who success- 
fully cloned the sheep Dolly, to conclude that the risk 
of creating a human child with developmental prob- 
lems was so high that cloning is unethical for use in 
humans. 
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Attempts at reproductive cloning of other, nonhu- 
man primates such as monkeys have been very dif- 
ficult. Gerald Schatten at the University of Pittsburgh 
worked for years to clone monkeys using known 
somatic cell nuclear transfer methods. Eventually his 
research team produced two cloned rhesus monkeys 
but they were generated by embryo splitting and not 
by nuclear transfer. The monkey embryos were made in 
the laboratory by combining eggs and sperm and cul- 
turing the cells in the lab to form eight-cell-embryos. 
The embryo cells were physically separated into four 
embryos containing two cells each. These two-cell 
embryos are genetically identical twins. In initial stud- 
ies, 107 embryos were used to generate 368 split two- 
cell embryos, of which only two embryos survived to 
the blastocyst embryo stage and were implanted into 
separate female monkeys. Only one of the monkeys 
was born and survived, emphasizing the difficulty and 
inefficiency of working with primate embryos. Later, 
the same research group produced a second split- 
embryo monkey. 

The process of reproductive cloning involves the 
production of individual animals with identical nuclear 
genomes (see Chapter 12). The nucleus is removed 
from an adult cell and transferred into an empty oocyte 
(an enucleated egg cell) which no longer contains a 
nucleus. The oocyte begins to undergo cell division 
when activated with a pulse of electricity (or similar 
trigger). After a period of growth in culture, the embryos 
were transferred into the uterus of an appropriate 
female animal treated with hormones to support preg- 
nancy. Reproductive cloning can also be accomplished 
using embryonic stem cells, as well as oocytes. 

Therapeutic cloning allows scientists to produce 
genetically matched embryonic stem cells by somatic 
cell nuclear transfer. Until recently, SCNT was accom- 
plished most reliably in mice, and in 2007, scientists 
produced embryonic stem cells from rhesus macaque 
(primate) embryos using SCNT and the nuclei from 
adult male skin cells. As expected, the process was 
quite inefficient; only 16% formed blastocysts (35 out 
of 213). Given the risk of harvesting and manipulating 
human oocytes, and a myriad of ethical reasons, this 
procedure is unlikely to be applied to humans. 


iPS Cells Cure Sickle Cell in Mice 


In 2007, the Yamanaka group (in Japan) and James 
Thomson and colleagues (at the University of 
Wisconsin) announced that they had induced the 
development of pluripotent human cells that exhibit the 
properties of human embryonic stem cells (see Chapter 
12). They published their research in two highly reputa- 
ble peer-reviewed scientific journals (Cell and Science). 
The amazing feat of producing pluripotent cells with 
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unlimited developmental potential from adult skin cells 
was achieved by introducing a specific set of transcrip- 
tion factor genes into the skin cells. Yamanaka’s team 
inserted genes encoding Oct3/4, Sox2, Kif4, and c-Myc 
into the skin cells. Later they found that they could 
generate pluripotent mouse stem cells without using 
the c-Myc gene, which was good since c-Myc was 
implicated in converting iPS cells into cancer cells. 
Thomson’s group took a similar approach and created 
iPS cells from adult skin cells using DNA encoding 
the transcription factor proteins Oct4, Sox2, Nanog, 
and Lin28. These transcription factor proteins work 
together as master regulator proteins that keep cells in 
an embryonic-stem-cell-like state (see Chapter 12). 

The potential of iPS cells to treat a genetic dis- 
ease has been realized in the research lab. In 2007, 
researchers at MIT and the University of Alabama 
showed that they could successfully treat mice with 
sickle-cell anemia by reprogramming the mouse cells 
to make induced pluripotent stem (iPS) cells (see 
Chapter 12). To try to treat sickle cell disease in mice, 
the scientists induced the mouse iPS cells to develop 
into the precursor stem cells in adult bone marrow 
that generate mature blood cells (Figure 14.15). Sickle 
cell anemia, the most common inherited blood dis- 
order in the United States, is caused by a single base 
pair change in the 8-globin gene (see Chapter 10). As 
a result, the mutant 6-globin protein forms defective 
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FIGURE 14.15 Sickle cell anemia successfully treated in mice by 
reprogramming the mouse genome. The approach used by scientists 
(MIT/Whitehead) to successfully treat mice carrying the human beta- 
globin gene with the sickle-cell mutation begins by directly repro- 
gramming the mice’s own cells to an embryonic-stem-cell-like state. 
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hemoglobin complexes called hemoglobin S (HbS), 
which distort the red blood cells into sickle shapes that 
clog the blood vessels and cause severe pain and other 
symptoms (see Chapter 10). 

The scientists replaced the mutant 8-globin gene in 
the iPS mouse cells (see Chapter 12) with the wildtype 
B-globin gene DNA to try to correct the sickle cell 
defect using a gene therapy strategy. The genetically 
treated iPS cells were injected into the mice with sickle 
cell disease. The treated mice expressed the wildtype 
B-globin proteins and produced functional hemoglobin 
HbA complexes, effectively curing sickle cell anemia 
using the iPS cells. 

Scientists are encouraged by the generation of 
“embryo free” iPS cells that could eventually provide 
genetically matched, differentiated cells for the person- 
alized treatment of many human diseases. Of course 
additional research is needed to establish conditions 
for the isolation and differentiation of the iPS cells to 
accommodate all the different specialized cells needed 
to treat most human genetic diseases. 


It was a major development in animal biotechnology as 
well as in human health and medicine when scientists 
successfully induced adult human skin cells to become 
embryonic stem cell-like. Like other pluripotent cells, the 
iPS cells have the ability to develop into a large number of 
different types of cells in the human body. 


SUMMARY 


The field of animal biotechnology will continue to pro- 
vide many benefits to human health and medical care 
in the future. The methods used to create genetically 
altered animals have advanced to the point where the 
use of transgenic animals is routine. Improvements 
in technologies involving microinjection, embryonic 
stem cells, and new vectors have increased the effi- 
ciency of introducing genes into cells and into new 
animals. Important genes have been studied by delet- 
ing gene copies from animal genomes using knockout 
technology. Transgenic animals have been created and 
manipulated using interference RNA and the Cre/lox 
systems, which permit scientists to study the tissue- 
specific expression of genes. 

Transgenic animals have been developed as 
research tools that provide animal models of genetic 
diseases. The development of transgenic animals to 
produce biopharmaceuticals has become an active 
area of research and currently involves engineering 
animals so that the drug, usually a protein, is secreted 
in the milk or other product such as hair. Although a 
great deal of progress has been made in this research, 
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the efforts to move this research into human clinical 
trials has been understandably slowed because of the 
regulatory agencies’ concerns about the purity, pro- 
duction, and safety of biopharmaceuticals made for 
humans by transgenic animals. 

There is a recognized but as yet unrealized poten- 
tial to genetically engineer agriculturally important 
animals to improve the reproduction and health of the 
animals and to increase the quality of products such 
as meat and wool. The FDA has concluded that milk 
or meat from cloned animals do not present a signifi- 
cant risk even if they did enter the food chain. The 
resistance of animal rights activists and the uncertainty 
about consumer acceptance of transgenic animal 
products have slowed corporate sponsored research 
and development. 

The transplantation of organs from animals into 
humans (xenotransplantation) has the advantage that it 
could treat organ failure and reduce the waiting times 
for human organs for transplant, but concerns about 
safety and the recent progress made in therapeutic 
cloning technology have reduced commercial interest 
in this approach. 

The safety and ethics issues surrounding animal 
reproductive cloning research have essentially halted 
all human reproductive cloning supported by federal 
funds in the United States. Therapeutic cloning holds 
a very promising approach for the future treatment 
of human diseases. An exciting major step forward 
occurred when scientists showed that a small number 
of transcription factor genes are capable of converting 
human skin cells into iPS cells, pluripotent cells that 
have the ability to develop into a large number of spe- 
cialized human cells. This discovery shifted the focus 
away from human embryos as the major source of 
human embryonic stem cells for research and for many 
applications of stem cell technology (see Chapter 12). 
Experiments with mice carrying the mutant human 
B-globin gene that causes sickle cell anemia disease 
showed that the induced pluripotent stem cells (iPS) 
have the potential to cure human genetic disease. 


REVIEW 


To test your understanding of the concepts in this 
chapter, answer the following questions: 


1. How were the earliest transgenic animals 
produced? 

2. What is a blastocyst, and what role does it play in 
the generation of embryonic stem cells? 

3. What are knockout transgenic animals, and how 
are they produced? 

4. Why is it necessary to make certain genetic altera- 
tions in the germ line cells? 
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5. What arguments favor the production of biophar- 
maceuticals in the milk of transgenic animals? 

6. What are some of the challenges scientists face 
when trying to produce drugs and other products 
in transgenic animals? 

7. What social and ethical issues surround the field 
of xenotransplantation? 

8. What is human reproductive cloning, and why is it 
considered to be unethical? 

9. What are some of the biopharmaceutical products 
that have been successfully produced in trans- 
genic animals? 

10. How does reproductive cloning differ from thera- 
peutic cloning? 
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Scientists Create “No Tears” Onions 


Agence France-Presse, February 1, 2008 

Scientists in New Zealand and Japan have created a 
“tear-free” onion using biotechnology to switch off the 
gene behind the enzyme that makes us cry, one of the 
leading researchers said Friday. The discovery could signal 
an end to one of cooking’s eternal puzzles, why does cut- 
ting up a simple onion sting the eyes and trigger teardrops? 
The research institute in New Zealand, Crop and Food, 
used gene-silencing to make the breakthrough which it 
hopes could lead to a prototype onion hitting the market in 
a decade’s time. 


The engineering of a no-tears onion represents the 
application of one of the most advanced methods of 
genetic engineering to a common consumer prob- 
lem. Researchers discovered that human tears are a 
response to the sulfur compounds that are released by 
the onion when the onion cells are sliced. The no-tears 
onion was created using an RNAi method where small 
interfering RNAs (siRNAs) were used to prevent the 
expression of a specific gene and the encoded protein 
product. The Nobel Prize for medicine in 2006 was 
awarded to Andrew Fire of Stanford University and 
Craig Mello of University of Massachusetts Medical 
School for their development and use of siRNA 
method in their research (see Chapter 11). RNA inter- 
ference using siRNA gene suppression has been used 
to silence the expression of different genes in many 
animal and plant cells in addition to the onion plant. 


LOOKING AHEAD 


This chapter describes how the tools of molecular 
biology and recombinant DNA technology have been 
used to analyze and modify the DNA found in agri- 
culturally important plants. Like transgenic animals, 
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genetically modified (GM) plants are useful but also 
have potential dangers and raise important controver- 
sies. On completing the chapter, you should be able to 
do the following: 


e Describe the different approaches used to introduce 
new genes into plants. 

e Explain why and how a gun is sometimes used in 
changing the genes of a plant. 

e List some major food crops that have been geneti- 
cally altered. 

e Describe the mechanisms used by pest- and 
weed-killer-resistant genes that are advantageous to 
farmers who plant genetically altered crops. 

e Describe the roles of the Environmental Protection 
Agency (EPA), Food and Drug Administration (FDA), 
and the U.S. Department of Agriculture in the regu- 
lation of GM plants. 

e Explain the potential advantages and disadvantages 
of an application of agricultural biotechnology. 

e Describe the efforts to produce protein drugs in 
plants. 

e Describe the environmental and safety issues raised 
concerning agricultural biotechnology. 
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e Describe how the development of plant or agricul- 
tural sources of biofuels will reduce the use of fossil 
fuels in the United States. 

e Consider and discuss the impact of agricultural 
biotechnology in developing countries. 


INTRODUCTION 


Scientists working in the field of agricultural biotech- 
nology use the tools of molecular biology to analyze 
the genetic information in plants, to detect bacterial 
and viral infections in plants, and to modify the genetic 
information in plants, which alters the proteins made 
by the plant cells. Genetically modified soybean and 
corn plants were first tested in the 1980s to determine 
how the modified plants would respond to herbicides 
(chemical weed-killers). GM plants with the geneti- 
cally enhanced ability to repel invasion by worm pests 
were first commercially available in the United States 
in 1995-1996. 

The technical successes of genetically modified 
organisms (GMOs) used for agricultural purposes are 
numerous, but agricultural biotechnology and GMOs 
have not been widely accepted by the international 


Box 15.1 Ice-Minus Bacteria Prevent Frozen Strawberries 


The first crisis in the new field of agricultural biotechnology did 
not involve a genetically altered plant but a genetically altered 
bacterium. Steve Lindow, a researcher in California, discovered 
the “ice-minus protein” gene in Pseudomonas syringe, a bacte- 
ria commonly found growing on the outside of strawberry fruit 
in the field. The ice-minus protein offered the ability to prevent 
damage to strawberries from ice crystals that form on the sur- 
faces of the strawberries when the temperature drops below 
freezing. To test this idea, Lindow inserted the gene coding for 
the ice-minus protein into a strain of Pseudomonas bacteria 
commonly found on strawberries. He wanted to test whether 
the engineered ice-minus Pseudomonas, called Frostban, could 
outgrow the resident population of Pseudomonas and protect 


the strawberries in the field from the cold temperatures. At the 
time, the proposed test was approved by the National Institutes 
of Health Recombinant DNA Advisory Committee (RAC). 
However, Lindow’s proposal to test Frostban in an open field 
raised serious concerns among local residents and national 
opponents of genetic engineering. Eventually the Frostban test 
was performed with technicians and observers in the field out- 
fitted in HazMat jumpsuits (Figure 15.1). Although the test was 
technically successful—the Frostban bacteria grew more rap- 
idly than the resident Pseudomonas bacteria and protected the 
strawberries from low temperatures—the controversy discour- 
aged any further development of Frostban as a commercially 
available biotech product. 


(A) 
FIGURE 15.1 


(B) 


Frostban ice-minus bacteria are used to treat strawberry plants. (A) Technician in biohazard suit treats strawberries with 


Frostban; (B) Frostban is a genetically modified strain of Pseudomonas bacteria. 
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community. The public is particularly concerned about 
the use of unfamiliar gene-altering technologies to 
change the properties of the foods they eat, especially 
familiar staples like milk (see Chapter 14). In the United 
States and Europe, agricultural biotechnology has been 
portrayed by opponents as an unnatural process that has 
unknown effects on the crops, the environment, and 
the health of humans and animals. To try to reassure the 
public, the companies developing GMOs sought gov- 
ernment regulations to oversee advances in agricultural 
biotechnology. The Environmental Protection Agency 
(EPA), the U.S. Department of Agriculture (USDA), and 
the Food and Drug Agency (FDA) developed a regula- 
tory system to review research advances in the field. 
The successful use of GMOs in agricultural applica- 
tions is reflected in the number of global acres devoted 
to growing crops for biotechnology applications, which 
increased from 4.3 million acres in 1996 to 252 mil- 
lion acres in 2006. The United States leads the world in 
the number of acres planted with genetically modified 
crops, with 54.6 million acres consisting of soybeans, 
corn, cotton, canola, squash, papaya, and alfalfa. 


Flavr Savr Tomatoes 


A second crisis in agricultural biotechnology occurred 
after the federal agencies responsible for regulating 
agricultural biotechnology were established. GM plants 
that had been extensively tested in closed greenhouses 
were granted a nonregulated status by the Animal Plant 
Health Inspection Service of the USDA and grown in 
open fields. Calgene, a California company, was given 
nonregulated status for Flavr Savr tomatoes, which were 
engineered to ripen but not soften while growing on the 
vine (Figure 15.2). This tomato was made more resist- 
ant to rotting by using RNAi technology (see Chapter 
11) to silence the gene encoding the enzyme polyga- 
lacturonase (PG). Normally the unmodified tomatoes 


FIGURE 15.2 Calgene created Flavr Savr tomatoes, which were 
engineered to ripen but not to get soft on the vine. The tomato was 
made more resistant to rotting using RNAi antisense technology to 
silence a key enzyme in the biochemical process of decomposition. 
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are harvested before they are ripe, which permits ease 
of handling and shipping, but then the unmodified 
tomatoes must be artificially ripened after harvest using 
ethylene gas. The Flavr Savr tomatoes could be allowed 
to ripen on the vine without compromising shelf life. 

The genetically engineered Flavr Savr tomatoes 
were approved after the FDA concluded that the Flavr 
Savr tomatoes were as safe as conventional tomatoes 
and first sold to consumers in 1994. However sales 
were low as many people, including restaurant chefs, 
consumers, and anti-GMO activists, boycotted the 
Flavr Savr tomato. In the end, the Flavr Savr tomatoes 
were a disappointment because although the silenced 
PG gene had a positive effect on shelf life, it did not 
alter the firmness of the GM tomatoes. This meant that 
the Flavr Savr tomatoes had to be harvested like any 
other unmodified vine-ripe tomatoes, negating one of 
the advantages of making the genetically altered Flavr 
Savr tomatoes. However, after the Flavr Savr tomato 
plants were approved, scientists tested other GMOs, 
including corn that is resistant to the European corn 
borer, virus-resistant squash, herbicide tolerant cotton, 
soybeans resistant to an herbicide, and canola (rape- 
seed) oil with commercial and industrial uses. 


A primary focus of agricultural biotechnology research 
involves the development of transgenic plants designed to, 
for example, improve the taste of the fruit or increase the 
disease resistance of the transgenic plants. 


AGRICULTURAL BIOTECHNOLOGIES 
Genetically Modified Organisms (GMOs) 


The goal of transferring new genes into plants is to alter 
the plant genome with the intention of expressing trans- 
genic proteins in the plant. Several methods have been 
tested since the early 1980s to find the best way to 
introduce DNA into plant cells. For the DNA to trans- 
form the cells, the DNA must enter the cell nucleus 
where the genes carried on the vector are expressed as 
mRNAs and translated into transgenic proteins in the 
plant cells. 


A Complete Plant Can Be Grown from 
a Single Cell 


Since the 1920s, scientists have known that it is possi- 
ble to grow an entire new plant from a single root cell 
that is nurtured in tissue culture media and exposed to 
appropriate plant hormones (Figure 15.3). The fact that 
an entire plant can be grown from a single cell indicates 
that the genetic changes introduced into the single cells 
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will be transmitted to the whole plant and can give rise 
to many genetically identical plants (clones). 

In the United States, GM plants can be patented, 
which temporarily protects the company’s invention 
or discovery. However, the Plant Patent Act, passed in 
1930, gave protection only to asexually propagated 
plants (from bulbs, tubers, or cuttings) because at 
that time plant scientists doubted that sexually propa- 
gated plants (from seeds) would have offspring that 
inherit the traits from their parents. Later, contrary 
genetic evidence led to the Plant Variety Protection 
legislation in 1970, which provided patent protection 


© © 
Root cells eo 


© 
=P-° 
grow i 


Culture 
root cells 


+4 


Root cells 


Cloned 
plants 


FIGURE 15.3 A complete plant can be generated from a single 
plant cell. An entire new plant can be generated from a single cell 
that is grown in tissue culture media. The cell is treated with appro- 
priate plant hormones that induce the expression of certain genes 
during development. The root cells develop into calluses in culture, 
which then grow into whole plants. 
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for sexually propagated plants. The demonstration of 
successful recombinant DNA techniques in microor- 
ganisms led researchers to experiment with mutations 
in plant genes (see Chapter 4). Success with GM plants 
convinced companies to invest in research to generate 
new transgenic plants that could be patented by the 
companies and sold to growers. 


INSERTING GENES INTO PLANTS 


Agrobacterium Tumefaciens 


A common method used to introduce genes into plants 
involves the soil bacterium, Agrobacterium tumefaciens. 
Outside the lab, this bacterium causes crown gall dis- 
ease, which produces large tumor-like growths on the 
trunks and branches of certain trees (Figure 15.4). These 
tumors grow because the A. tumefaciens bacterium can 
take over the plant cell and alter the cellular metabo- 
lism. A. tumefaciens bacteria contain a DNA plasmid 
that carries a segment of bacterial DNA called the Ti 
complex. When the bacteria adhere to a plant cell, the 
Ti DNA is transferred into the plant cell and then into 
the nucleus where the Ti DNA becomes inserted into the 
plant cell genome. As a result the plant cell expresses 
Ti bacterial genes and produces the A. tumefaciens pro- 
teins. The genes carried on the Ti DNA plasmid alter the 
normal growth of the plant cells, causing the crown gall 
tumors. 

To introduce foreign DNA into a plant genome, the 
scientists insert the Ti DNA into a bacterial plasmid 
that is grown in large amounts in E. coli cells in the 
laboratory. The plasmid DNA contains many sites that 
are recognized and cut by restriction enzymes. These 
enzymes have the distinguishing feature that they will 
cleave a DNA double-helix molecule at only spe- 
cific DNA sequences (see Chapter 4). Any change in 
a restriction site, even a single base pair change, will 
prevent the restriction enzyme from cutting the DNA 


(A) 


(B) 


FIGURE 15.4 Agrobacterium tumefaciens can move DNA into plant cells. (A) Agrobacterium tumefaciens Crown Gall tumor. (B) Agrobacterium 
tumefaciens bacteria adhere to the surface of the plant cell, an essential step in the transfer of DNA from the bacterium into the plant cell. 
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at the altered site. For the A. tumefaciens plasmid to 
be used as a DNA vector to carry genes into plant 
cells, the scientists used recombinant DNA methods 
to remove the tumor-causing genes from the Ti plas- 
mid and to add an antibiotic resistance gene into the 
Ti plasmid DNA. This gene permits bacterial cells con- 
taining the Ti DNA (and the antibiotic resistance gene) 
to grow in the presence of the antibiotic, whereas bac- 
teria lacking the Ti DNA cannot grow. The modified Ti 
plasmid DNA was introduced into the A. tumefaciens 
bacteria, which were then used to infect the plant cells 
(see Figure 15.4). 

Over the years, as scientists learned more about the 
molecular mechanisms of the Ti plasmid DNA, they 
devised powerful strategies to apply A. tumefasciens 
technology to many areas of agriculture. However, the 
increasing use of vectors containing antibiotic resistant 
genes has raised concerns about potential environmen- 
tal and health dangers. The widespread planting of GM 
crops carrying vectors with antibiotic resistance genes 
has the potential to spread the antibiotic resistance 
genes to pathogens living in the soil. These organisms 
can acquire antibiotic resistance genes, making the 
pathogens much more dangerous and reducing the 
overall effectiveness of the handful of antibiotics avail- 
able to treat diseases in humans and animals. 

To eliminate this potential danger, scientists have 
developed methods to remove the antibiotic resist- 
ance gene from the plant genome after the gene has 
served its purpose. For example, researchers used the 
Cre-loxP site-specific recombination system to delete 
antibiotic resistance gene DNA from the plant genome 
(see Chapter 14). Briefly, the Cre enzyme catalyzes 
the recombination events between two loxP DNA 
sequences, which deletes the gene that was flanked by 
the loxP sites in the cell genome. 


Transgenic animals and transgenic plants can be produced 
in more than one way depending on the characteristics of 
the animal or plant, the transgene, and the expected out- 
come of transgene expression. 


Gene Transfer into Protoplasts 


Agrobacteria can naturally infect only certain plants, so 
new methods have been developed to introduce new 
genes into plants that are resistant to agrobacterium 
infection. Successful approaches include transferring 
DNA into plant protoplast cells, or shooting the DNA 
into cells using a gene gun (see Chapter 11). All plant 
cells have an external cell wall made of a carbohydrate 
polymer of cellulose, which is part of the physical 
barrier that protects the cell and the nucleus from the 
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outside environment. Enzymes that degrade cellulose 
are often used to remove the plant cell wall to gener- 
ate protoplasts (Figure 15.5). The uptake of DNA by 
protoplasts can be increased by treatment with poly- 
ethylene glycol or by the use of a mild electric pulse 
called electroporation. Both methods gently perturb 
the phospholipid cell membrane and promote DNA 
uptake into the cell (see Chapter 10 and Chapter 11). 


SHOOTING GENES INTO CELLS 


Another method used to make plant cells take up for- 
eign DNA is the biolistic gene gun (Figure 15.6). The 
particles of tungsten or gold are coated with the DNA 
to be delivered into the cells and loaded into a shell 
in the gene gun. The shell is rapidly accelerated out of 
the gene gun, using either a pneumatic or a mechani- 
cal propulsion method. A perforated plate stops the 
shell to avoid damage to the cells, while the metal par- 
ticles continue into the cells and deliver the DNA into 
the chloroplast, nucleus, or mitochondria. 


Insertion of DNA into Chloroplasts 


Green plant cells contain specialized membrane 
bound organelles called chloroplasts, which are the 
site of photosynthesis, the process by which plants use 
sunlight to produce carbohydrates from carbon dioxide 
(CO) (Figure 15.7). The gene gun can deliver the DNA 
vector (and genes) into the chloroplasts within a plant 
cell. In these experiments the DNA vector contains 
the gene(s) to be transferred, as well as a chloroplast 
marker gene, and an antibiotic resistance gene that 
can be expressed in the chloroplast. The DNA vector 
also contains chloroplast-specific sequences that direct 
integration of the DNA vector into a precise location 
in the chloroplast genome. Chloroplasts can express 
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FIGURE 15.5 Protoplasts are plant cells with the cell walls 
removed. These protoplasts were made from spinach cells. 
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FIGURE 15.7 Chloroplasts are the site of photosynthesis in green 
plant cells. A false-color transmission electron micrograph of a chlo- 
roplast from a tobacco leaf shows the many stacked membranes in 
the chloroplast. 


high levels of foreign proteins that fold properly and 
are biologically active inside the chloroplasts. 

Since the early 1990s, numerous foreign genes have 
been expressed in the chloroplasts of higher (multicel- 
lular) plants, including herbicide resistant genes, insect- 
resistant genes, a drought resistance gene, and genes 
that can degrade or metabolize mercury. A number of 
biopharmaceutical compounds and a vaccine against 
cholera toxin have been produced using chloroplasts, a 
possible source of vaccines against serious diseases. 


Scientists routinely introduce DNA into the nuclei of differ- 
ent types of cells, but now it is possible to introduce DNA 
into the genomes of mitochondria and chloroplasts, which 
are organelles located in the cytoplasm of eukaryotic cells. 


Improving Vectors Chromosomes 


The production of transgenic plants has some of the 
same drawbacks that limit options when constructing 
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FIGURE 15.6 A gene gun successfully delivers 
DNA into plant cells. (A) The gene gun. (B) (1) The 
gene gun fires ‘bullets’ made of tiny particles of 
gold or tungsten coated with vector and gene DNA. 
(2) The gene gun is fired at the plant cells, which 
forces the DNA particles through the plant cell wall 
without damaging the cell. (3) The gene gun can 
deliver the DNA into the nucleus, mitochondria 
or chloroplast organelles depending on the type of 
vector used. 


FIGURE 15.8 Minichromosomes are small versions of native chro- 


mosomes. The minichromosome produced by telomere-mediated 
truncation is indicated by the arrowhead. Each minichromosome 
contains centromere DNA (green). Telomeres are added to the 
chromosome by transformation. The chromosome DNA breaks at 
that site and adds genes at the tip of the chromosome (red). (Inset) 
Transgenes (red), centromere region (green), and the merged image 
(red/green). Chromosomes are stained blue. 


transgenic animals. Anytime a gene is added to a plant 
or animal genome, the new gene can physically disrupt 
an essential gene in the genome or alter gene expres- 
sion by inserting into a control region of the gene such 
as a promoter or enhancer. Recently Daphne Preuss 
(University of Chicago and Chromatin, Inc.) and her 
team constructed minichromosomes, circular segments 
of the corn genome that replicate as circles in the corn 
cells (Figure 15.8). The corn minichromosomes contain 
repeated DNA sequences that are typically found in 
the centromere of each plant chromosome, the region 
of the chromosome that attaches to spindle fibers 
and moves the chromosome during cell division (see 
Chapter 9). The centromere DNA on the minichromo- 
some allows the circular chromosomes to be inherited 
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correctly through several generations of corn cells. In 
comparison, minichromosomes without centromeres 
are very unstable and are physically lost when the cell 
divides. Without a functional centromere region, the 
minichromosome cannot attach to the spindle fibers 
when cells divide. 


Minichromosomes are effective vectors for use in complex 
plant cells because they contain the DNA elements needed 
to function as a native chromosomes including centromeres 
for stable inheritance and replication origins to ensure chro- 
mosome duplication. The corn minichromosomes are circu- 
lar but they can be double-stranded linear DNA molecules. 


HOW AGRICULTURAL BIOTECHNOLOGY 
IS REGULATED IN THE UNITED STATES 


The agencies of the United States federal government, 
including the Environmental Protection Agency, the 
Department of Agriculture, and the Food and Drug 
Administration, are responsible for reviewing proposals 
for the field testing and commercialization of genetically 
modified crops. Much of the regulatory review focused 
on the products made for human consumption and 
not on the scientific methods used to create the prod- 
uct. Safety issues include the possibility that insertion of 
a gene into a genome might disrupt the function of an 
essential gene or cause the production of new proteins 
that might cause an allergic reaction in people or be 
otherwise harmful to people or to the environment. 

Farmers, universities, and seed companies have 
been altering the properties of plants for hundreds of 
years through conventional breeding programs. The 
domestication of wild plants, which began 10,000 years 
ago, changed the genetic makeup of plants through 
conventional breeding. For example, Native Americans 
derived what we know as modern corn from teosinte, a 
related plant with small kernels and ears (Figure 15.9). 
Genetic manipulation using recombinant DNA meth- 
ods to produce transgenic plants and animals is often 
more precise than the genetic changes that occur by 
conventional breeding methods. 


WHAT HAS CAUSED RESISTANCE TO 
AGBIOTECH IN EUROPE? 


The European Union (EU) limits the cultivation of genet- 
ically engineered crops and requires stringent testing. 
In 2006, Spain, France, Germany, the Czech Republic, 
and Slovakia each reported growing about 1.23 mil- 
lion acres of GM corn. They authorized the importa- 
tion of GM food and animal feed produced from plants 
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FIGURE 15.9 Comparison of teosinte and maize (corn). (A) Teosinte 
and reconstructed primitive maize were created by crossing teosinte 
with Argentine popcorn and selecting the smallest offspring. (B) Ear 
of teosinte (left, Zea mays ssp. mexicana), maize (right), and the F1 
hybrid (center) made by crossing the teosinte and maize. 


(corn, oilseed canola, soybeans, and cotton) that are 
protected from insects. Herbicide-resistant sugar beets 
and potatoes with altered starch composition are under 
review. The resistance to GMOs that is evident in most 
EU countries is reflected in the different approaches to 
uncertainty and risk taken by regulators in the United 
States and the European Union. The U.S. regulators 
make their decisions on GMOs based on a systematic 
risk assessment and application process devised by 
each agency. The EU relies on the precautionary princi- 
ple, formulated at a United Nations meeting on biologi- 
cal diversity, which says that if the negative impacts of a 
proposed new technology (such as genetic engineering) 
may be severe and are unknown, it is prudent to wait 
until these negative consequences can be shown not 
to occur before the technology is allowed. Using this 
approach, the scientific difficulty in proving a negative 
essentially prevents the development of new technolo- 
gies such as agriculture biotechnology. 


Public opinion about biotechnology and GM plants are 
shaped in large part by the extreme views reported in the 
popular press instead of relying on scientific facts. Many 
people with different motives are invested in the contro- 
versy over transgenic plants and animals. 


NEW GENES IN THE FIELDS 


Most people in the United States eat a diet that has 
included steadily increasing amounts of products 
derived from genetically engineered crops since 1992. 
These products include soy food, food made with corn 
including corn flour, corn oil and corn syrup, canola 
oil, cottonseed oil, sugar from beets, potatoes, squash, 
papaya, and beets. Recently the Agriculture Research 
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Service (ARS), the research division of the USDA, 
developed and received “non-regulated” status for a 
genetically modified plum tree that is resistant to the 
plum pox virus. The ARS accomplished this feat using 
RNAi technology to silence the production of the plum 
pox virus coat protein, which halted the virus lifecycle 
in the plum tree. 

Many transgenic plants were engineered to carry 
genes that confer resistance to an insect, a virus, or an 
herbicide. Herbicides kill crop plants as well as weeds. 
A crop that is resistant to an herbicide allows the farmers 
to use substantially fewer herbicide treatments to control 
weeds without harming the crop. Many types of trans- 
genic plants are engineered to be resistant to the herbi- 
cide glyphosate, which is called Roundup by Monsanto. 
Glyphosate is toxic to plants because it inhibits the 
action of an enzyme called 5-enolpyruvylshikimate-3 
phosphate synthetase (EPSPS), which is required for the 
plant to synthesize some essential amino acids. In this 
case scientists used a gene from £E. coli that encodes 
the bacterial form of the EPSPS enzyme, and confers 
resistance to glyphosate. When inserted into the plant 
genome, the bacterial gene makes the recombinant 
plants resistant to the glyphosate herbicide. 

The soil bacterium Bacillus thuringiensis (Bt) has 
been marketed as a biopesticide because recombinant 
Bt cells have been engineered to produce many differ- 
ent protein products, including pesticides. But even 
without carrying a foreign gene, Bt bacteria are toxic 
to many kinds of insects. 

Wild-type Bt cells normally express genes for six 
different crystalline (Cry) toxin proteins, which dissolve 
in the alkaline environment of the insect’s stomach 
(Figure 15.10). The Cry proteins bind to Cry protein- 
specific receptors on the surfaces of the cells that 
make up the intestine wall in the insect. This causes 
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(B) 
FIGURE 15.10 Bt powder threatens Monarch butterfly larvae. (A) Adult monarch butterfly. (B) Monarch butterfly larvae. 


pores to form that allow lethal fluid uptake by the 
insect’s gut. Bt cells have limitations as a biopesticide 
because the powdered preparation of Bt cells must be 
applied frequently since the Bt powder is washed away 
by rain. Also the levels of Bt toxicity are unpredictable 
in prolonged sunlight. 

Recombinant Bt cells offer important advantages 
for the expression and preparation of large amounts 
of eukaryotic proteins in the lab. A DNA vector carry- 
ing a promoter that normally drives transcription of the 
crystalline protein genes is used to clone foreign pro- 
tein genes. The foreign protein coding gene is inserted 
into the vector under the transcriptional control of 
the promoter for the crystalline protein gene. Like the 
normal crystalline proteins, the recombinant proteins 
are made at high levels and can be harvested in large 
quantities from insect cells grown in culture in the lab. 

In 1999, the results of a small research study by 
John Losey (Cornell) published in Nature reported 
that Monarch butterfly larvae were harmed by eating 
pollen from corn plants treated with Bt powder (Figure 
15.11). In the study the Monarch butterfly larvae were 
allowed to feed on milkweed leaves covered with 
pollen from the Bt corn. Monarch butterfly larvae 
typically feed on the leaves of milkweed plants near 
cornfields, raising the possibility that the pollen from 
the Bt corn would harm the monarchs. However, new 
research was published that disagreed with the initial 
conclusions of Losey’s research. Scientists from the 
USDA-ARS, U.S. and Canadian universities, and indus- 
try and environmental groups coordinated the develop- 
ment and review of grant applications to test Losey’s 
conclusion. This subsequent research disproved the 
original conclusions and was published in a series of 
papers in the Proceedings of the National Academy of 
Sciences, USA. 
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FIGURE 15.11 Mechanism of Bt toxicity in Monarch caterpillars. 


The proposed mode of action of Bt includes the following: (1) Bt 
spores are ingested by the insect; (2) the spores are dissolved in 
the midgut and the toxins are activated by proteases (enzymes that 
cleave proteins); (3) toxins bind to specific receptors in the midgut 
cells causing pores to form in the epithelial cells of the gut; (4) the 
larvae dies and the Bt spores germinate and are spread. 


In 2000, a type of Bt corn called Starlink was devel- 
oped by the biotechnology company Aventis, which 
attracted negative publicity when it became known 
that the DNA from the engineered Cry9C corn had 
been detected in tacos for sale in stores in the United 
States and Mexico. The detection of Cry9C in a human 
food source aroused concern because Starlink had 
received approval for use in animal feed but had not 
been approved for human consumption. The level of 
contamination of the taco with Cry9C was tiny and 
was only detected using DNA probes that base pair 
to the Cry9C DNA and the polymerase chain reac- 
tion (PCR). Even without more substantial evidence, 
public concern was raised, and Aventis collected and 
destroyed all the corn products containing and con- 
taminated with Starlink. Eventually 34 people reported 
to the FDA that they had suffered allergic reactions 
to the tacos contaminated with Cry9C. However, the 
actual impact is unclear as none of the people tested 
had raised antibodies against the Cry9C protein as 
would be expected for an allergic reaction to Cry9C. 
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This experience shed light on the potential problems 
if products are approved for use in animal feed but 
not approved for human consumption. In many cases, 
the farmer, trucker, or processor do not keep the two 
crops separate along the way from field to taco factory, 
or the trucks are not fully cleaned between loads. In 
the case of Starlink, all it took was a kernel remaining 
in a truck when the next load of corn was loaded, the 
tiniest contamination can be detected using the highly 
sensitive polymerase chain reaction. 


Biocontrol involves the use of one kind of organism to 
control the population of a different type of organism, for 
example, insects used to control bacteria. 


BENEFITS OF GM CROPS 


The International Service for the Acquisition of Agri- 
Biotech Applications (ISAAA) is a nonprofit group 
sponsored by philanthropic organizations and agricul- 
tural companies. The ISAAA compiles information on 
GMO crops worldwide and provides information that 
supports the economic and environmental benefits of 
adopting GMOs. These include the environmental and 
economic benefits of switching to no-till or low-till 
farming. Tillage refers to the need to turn the soil over 
before a new crop is planted, a process that helps with 
weed control and seed planting but that is also asso- 
ciated with the loss of carbon in the soil, which adds 
carbon dioxide to the atmosphere. 

The ISAAA calculated the benefits from the reduced 
use of herbicides and pesticides for the 10 years 
between 1996-2005 and found that the largest global 
benefit came from GM soy crops. The economic ben- 
efits were higher for all GM crops used in developing 
countries that have adopted agricultural biotechnology 
(South Africa, Paraguay, India, and Mexico). In fact, 
developing countries using GMOs accounted for 55% 
of the economic benefits realized in 2005. In addition, 
as a result of switching to no/low till practices, GM 
crops have decreased the use of herbicides and pes- 
ticides by 7% and reduced the environmental impact 
by over 15%. Not assessed was the expected reduc- 
tion in greenhouse gas emissions that accompany the 
adoption of no/low till methods and the need for less 
frequent applications of pesticides and herbicides. 


Many farmers in developed and under developed coun- 
tries have adopted the use of GM crops. Although activists 
object to their use, especially in food crops, the economic 
benefits to farmers and the environment have led to the 
increased use of GMOs. 
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Pharming-Protein Drugs from Genetically 
Altered Plants 


Applications of biotechnology have produced 
a number of novel protein drugs that have been 
approved for human use by the FDA and by drug reg- 
ulatory agencies in the United Kingdom and Europe. 
These biotechnology products were produced by trans- 
genic bacteria, yeast cells, or mammalian cells grown 
in bulk culture in large fermentation vats. The use of 
transgenic plants to produce recombinant proteins 
was pioneered by a number of biotechnology compa- 
nies who see transgenic plants as relatively inexpen- 
sive options compared to the bulk growth of cells in 
tissue culture. Also, transgenic plants can perform the 
appropriate posttranslational modifications of proteins 
that are required for function, including the addition of 
carbohydrates, phosphates, or other chemical groups 
to proteins. The failure to properly modify the eukary- 
otic proteins can prevent proteins from interacting 
with receptor proteins or can cause the protein to be 
metabolized and cleared from the blood too quickly 
or too slowly. 

Research into the use of plants to produce trans- 
genic proteins is at an early stage, and most plant- 
produced proteins are sold as research reagents and 
not for use in humans. Dow Agro Sciences gained FDA 
approval to market a poultry vaccine that was pro- 
duced in plant cell culture. Scientists are also working 
on the production of a vaccine in the edible portion of 
a plant, the fruit. This would be a cost-effective way to 
deliver the vaccine in developing countries with lim- 
ited refrigeration for vaccine storage and few skilled 
medical workers to administer conventional vaccines 
by injection. The plant is engineered with a transgene 
that expresses a specific part of a bacterial protein 
known to cause antibody production in an animal. 

Researchers at the University of Arizona developed 
and tested an oral vaccine for humans against hepatitis 
B (HepB), which they produced by transgene expres- 
sion in potatoes. The transgene expressed in potatoes 
was the same hepatitis virus DNA sequence that was 
incorporated to make the standard injectable HepB 
vaccine. The volunteers who ate the transgenic pota- 
toes (uncooked to avoid destroying the antibodies with 
heat) developed antibodies to HepB, but those who ate 
the conventional potatoes lacking the transgene (also 
uncooked) did not. The potato vaccine was not pursued 
further in part because of the requirement to eat the 
potatoes raw. This and another study on potatoes engi- 
neered to express a Norwalk virus protein showed sur- 
prisingly that these proteins survive in the stomach and 
can stimulate an immune response in the human body. 
Additional edible vaccines are under development. 
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FIGURE 15.12 Transgenic duckweed makes antiviral interferon. 
Interferon blocks the production of viruses by infected cells and was 
produced in duckweed, a simple plant that grows in water. 


Transgenic plants producing various biopharmaceuticals 
have been constructed and analyzed in research performed 
at both universities and companies. From 1992 to 2002, 
more than 20 different recombinant vaccines directed 
against disease-causing viruses or bacteria were expressed 
in transgenic fruits or vegetables. 


In both the United States and Europe, many 
companies are developing programs to produce bio- 
pharmaceuticals in transgenic plants. In some cases, 
the transgenic plant products have advanced to tri- 
als in humans, including a cancer vaccine produced 
in tobacco plants and a stomach enzyme designed to 
treat patients with cystic fibrosis and produced in corn. 
Transgenic interferon, which blocks the production of 
viruses in infected cells, was produced in duckweed, 
a simple plant that grows in water (Figure 15.12). The 
barriers to making biopharmaceuticals from transgenic 
plants include not only the dosage, production qual- 
ity, and efficacy of the process but also the standard 
challenges of cost-effective expression and purifica- 
tion of the protein. Regulators and activists raised 
environmental and safety concerns about the inad- 
vertent exposure of humans to the biopharmaceutical 
products from transgenic plants, for instance, workers 
involved in growing and processing transgenic plants 
risk exposure to the transgenic products. 

Also of concern is the possibility that a transgenic 
plant expressing a biopharmaceutical product could 
accidentally enter the food chain. This almost hap- 
pened in 2002 when soy for human consumption 
was found to be contaminated with corn engineered 
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to produce an enzyme to treat cystic fibrosis. The con- 
tamination was caused when the company ProdiGene 
planted the soy crops in fields that had been previ- 
ously used to grow the transgenic corn. The contami- 
nation was detected before the soy beans entered the 
food chain, the crop was destroyed, and ProdiGene 
paid a fine. The issues of containment and the need for 
buffer zones between crops were raised for all geneti- 
cally modified crops. Understanding these concerns is 
critically important for the safe production and use of 
biopharmaceuticals produced by plants. 


BIOTECHNOLOGY TOOLS HELP 
DIAGNOSE PLANT DISEASES AND DETECT 
TRANSGENES 


Polymerase Chain Reaction (PCR) Can 
Detect a Single DNA Molecule 


The molecular evidence that a plant has a disease 
or carries a transgene can be detected using the 
polymerase chain reaction (PCR) to amplify specific 
DNA sequences. PCR amplification is used to copy a 
single DNA molecule into millions of identical DNA 
copies, providing scientists with large quantities of 
specific amplified DNA for study. Using PCR, scientists 
can specifically detect the presence of the transgene 
DNA in plant cell genomes. PCR can also be used to 
detect the DNA or RNA molecules made by a disease- 
causing organism in an infected plant, even if these 
molecules are present in very, very small amounts. 
Kary Mullis was awarded the Nobel Prize in 1993 for 
his discovery of PCR technology. PCR uses thermosta- 
ble DNA polymerase enzymes isolated from organ- 
isms that grow at high temperatures. Through multiple 
repeated cycles of heating, cooling, and DNA replica- 
tion, the PCR method generates millions of copies of a 
specific DNA sequence (see Chapter 5). 


PCR is an extremely powerful research tool used in many 
scientific fields, including agriculture. PCR can detect 
viroids in plant cells, which are tightly folded RNA strands 
that do not encode proteins. PCR was used in 1993 to 
identify 4 different viroids that infect hops and fruit, 16 
viruses that infect cabbage, beans, corn, tomato, peas, and 
gladiolus, and 7 bacteria, 11 fungi, and 3 nematodes that 
infect a wide variety of plants. 


Microarray Technology 


Scientists use microarray technology to study large 
scale gene expression, for example, to compare the 
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genes expressed in healthy cells compared to the 
genes expressed in diseased cells (see Chapter 13). 
Other microarray assays include genes expressed in 
diseased cells compared to the genes expressed in the 
diseased cells treated with a specific drug. In agricul- 
tural biotechnology, the microarray technology has 
many applications including analysis of the expres- 
sion of transgenes in plant cells and tests to detect and 
diagnose infectious pathogens including plant viruses. 
Microarray technology can be used to screen infected 
cells to determine which type of plant virus might be 
causing a particular plant infection. 

The candidate viral DNA strands are located in dif- 
ferent known positions on the microarray grid pattern. 
The scientists harvested total RNAs from the infected 
plants, and copied the RNAs into cDNA probes con- 
taining a radioactive or florescent tag for subsequent 
detection. The microarray grid is incubated in buffer 
containing the single-stranded cDNA probe that will 
base pair to the DNA on the microarray slide corre- 
sponding to the virus that expressed mRNAs in the 
infected cells. This technique has been used to identify 
plants infected by potato viruses and cucumber viruses. 
A group of British and European academic, govern- 
ment, and corporate scientists (Diag Chip) is develop- 
ing microarray grids that can be used to detect a large 
number of the important plant pests and pathogens. The 
USDA supports research on methods of pathogen iden- 
tification, and the National Institutes of Health (NIH) 
and National Science Foundation (NSF) also provide 
support for basic research on plant pathogens. 


Microarray technology allows scientists to distinguish 
between the genes expressed by cells under different envi- 
ronmental and genetic conditions. 


Breeding Better Plants by Analyzing Genetic 
Variation 


Every year, plant breeders and farmers try to improve their 
crop yield by selecting plants that are known to respond 
well to environmental challenges such as drought or 
flood, and to be resistant to microbial or insect patho- 
gens. Advances in automated DNA sequence analysis 
have allowed scientists to determine the sequences of 
many plant genomes. As with human genomes, the DNA 
sequences of different plant genomes are compared to 
each other to search for genes associated with impor- 
tant traits such as pathogen resistance. Resistance to a 
specific pathogen might be controlled by a single gene, 
but a complex trait such as crop yield is unlikely to be 
controlled by a single gene. In fact, plant geneticists have 
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identified quantitative trait loci (QTL), which are specific 
chromosome regions containing several genes that influ- 
ence a complex trait such as crop yield. Scientists use this 
information to identify genes that are involved in speci- 
fying certain traits. Once identified, a gene can be iso- 
lated from the genome, sequenced, and studied further 
to determine the specific function of the protein product. 
Then scientists can make mutant versions of the gene and 
test the impact of the mutant transgene on the plant. 

The new field of bioinformatics (see Chapter 7) pro- 
vides scientists with powerful research tools for the man- 
agement and analysis of computer databases containing 
complex genetic information from genome studies. 
Once it became routine to sequence entire genomes, 
scientists compared the sequences of individual human 
genomes and found sequence variations that correlate 
with specific genetic traits. Similar genetic variations 
were found when the genome sequences of different 
lines of the same plant species were compared. DNA 
variations act as genetic markers in plant genomes, and 
can be correlated with specific traits or phenotypes in 
the genome. This information is essential for plant breed- 
ing programs to obtain the desired progeny. Commonly 
used genetic markers based on genetic variation include 
restriction fragment length polymorphisms (RFLP) and 
single nucleotide polymorphisms (SNPs) (see Chapters 
8 and 10). 

The RFLP and SNP markers are powerful tools 
used to study gene function, even though these DNA 
markers are not usually associated with biological 
function. RFLPs are simply DNA sequences that differ 
between genomes of the same species. RFLPs involve 
a DNA change that alters a restriction enzyme cleav- 
age site in the genome sequence. For example, if the 
DNA genome sequences from two individual plants of 
the same species are digested with the same restriction 
enzymes, comparison of the resulting DNA fragments 
will reveal differences in the DNA fragment lengths that 
indicate the position of the RFLP in the genome. RFLP 
analysis performed on the genomes of plants from the 
same species but with different phenotypes will provide 
information that helps the geneticist to decide which 
plant crosses (matings) will result in the desired prog- 
eny. The inheritance patterns of the SNPs and RFLPs are 
important to search for genes that contribute to the abil- 
ity of plants to grow and tolerate extreme environmen- 
tal conditions. The search is made more difficult by the 
likelihood that multiple interacting genes are responsi- 
ble for many plant traits. 


BIOCONTROL ALTERNATIVES TO 
PESTICIDES AND FERTILIZERS 


After six years of laboratory testing scientists at the 
University of Minnesota are field-testing a strategy 
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using the Chinese wasp (Binodoxys communis) to con- 
trol soybean aphids. In 2000, these aphids appeared 
in the soybean fields in Minnesota, which cost state 
farmers an estimated $200 million a year in lost crops, 
including the added expense of spraying pesticides. 
The Chinese wasps only sting and kill soybean aphids 
but are not harmful to humans or pets. Other wasp 
species from the same region of China are also under 
evaluation. 

Biological control (biocontrol) involves approaches 
that use parasites, predators, or pathogens to control 
an unwanted organism; in other words the use of bugs, 
infectious bacteria, or viruses to control pests. The appli- 
cation of biocontrol methods for plants includes the 
conservation of naturally occurring biological control 
agents, the release of additional native biological con- 
trol agents, or the classic introduction of an exotic, non- 
native, biological control agent to control an exotic pest. 

The first documented example of biocontrol 
occurred in 1883 in Maryland, where the wasp 
Apanteles glomeratus was introduced to control the 
accidentally imported cabbageworm Pieris rapae. The 
A. glomeratus wasp parasitizes the cabbageworm lar- 
vae, killing the insect and saving the cabbage crop. 
Early attempts at biocontrol methods were effective so 
further tests involving biocontrol took place in Florida 
and California. Many states have active programs 
to monitor pests and to identify potential biocon- 
trol agents, often using exotic organisms that origi- 
nate where the pest is native. The USDA Agriculture 
Research Service provides a database of invertebrate 
and microbial biological agents that are available to 
control invertebrate, weed, and microbial pests. The 
USDA regulates the release of these organisms, to 
guard against the use of imported biocontrol agents 
that can become pests. 


Biocontrol strategies are less successful in controlling 
weeds than in controlling insects and microbial pests. In 
the past some biocontrol agents became pests when they 
outgrew the native species and interfered with the growth 
of native plants. 


TERMINATOR TECHNOLOGY: ARE THE 
GM SEED COMPANIES EVIL OR 
PRUDENT? 


In 1990, scientists in Belgium constructed transgenic 
plants carrying a gene with an another-specific pro- 
moter (from tobacco) located in front of the barnase 
gene from the bacterium Bacillus amyloliquefaciens. 
The barnase gene codes for a ribonuclease, an enzyme 
that degrades RNA in the cell and is toxic to the tapetal 
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FIGURE 15.13 Pollen grains carry the male sperm cells. (A) This scanning electron microscope (SEM) image of pollen grains from a vari- 
ety of common plants including sunflower (Helianthus annuus), morning glory (Ipomoea purpurea), prairie hollyhock (Sidalcea malviflora), 
oriental lily (Lilium auratum), evening primrose (Oenothera fruticosa), and castor bean (Ricinus communis). (B) Pollen is deposited on the 
ends of the anthers. 


cells, which are required for pollen development in the 
plant. The results of the tests were dramatic. The trans- 
ferred gene prevented normal pollen development and 
caused male sterility in the transgenic plants. These 
observations made it possible to use the transgenic 
barnase gene to induce male sterility in other crops. 
Called terminator technology by the companies, male 
sterility is a genetic condition that prevents plants from 
producing functional anthers, pollen, or male gametes 
(sperm cells) (Figure 15.13). Pollen grains protect the 
sperm cells while they move from the stamens of one 
flower to the pistil of the next flower, causing the for- 
mation of seeds. 

In 2003, news headlines announced that the termi- 
nator technology developed by the large seed compa- 
nies was a new advance in agricultural biotechnology. 
What raised considerable public controversy was the 
idea that the seed companies planned to use termina- 
tor technology to engineer the male sterility trait into 
the genetically modified seeds of commercially impor- 
tant crops. Perhaps this response was not surprising 
considering the earlier negative public reaction to the 
perceived threat of genetically engineered plants and 
other GMOs, seen as a threat to agriculture, the envi- 
ronment, and human and animal health in developed 
and developing countries. 

Some farmers worried that if they grew crops using 
the seed company’s sterile “terminator” plants that do 
not make seeds, they would be unable to save seeds 
from one harvest to use for the next planting. They 
were very concerned that they would be forced to buy 
seeds each year from the seed companies using the 
terminator technology. People were also concerned 
that the plant gene involved in male sterility would 


be accidentally transferred into other plant genomes 
and spread the male sterility trait to other crops. As is 
often the case with new technologies, at least some of 
the resistance to terminator technology was caused by 
a general lack of understanding about male sterility 
in plants in part because of the failure of the compa- 
nies to clearly explain the basic science behind male 
sterility. The molecular mechanisms responsible for 
male sterility vary in different plants, providing the 
seed companies with a choice of paths to achieve the 
goal of male sterility. For example, the Pioneer seed 
company created corn with the male sterility trait 
by inserting the E. coli dam gene into the plant cells 
under the control of a promoter that turns on expres- 
sion of the dam gene only in the anther cells. The dam 
gene encodes a DNA methylase enzyme (Dam meth- 
ylase) that transfers a methyl group (-CH3) onto the 
G base in the GATC sites in the plant genome DNA. 
Methylation interferes with normal expression of the 
plant genes, causing the anthers to die, and conferring 
male sterility. 

Traditionally, farmers would save seeds from one 
harvest for planting in the next, a process called brown 
bagging, but this practice has some genetic risk and 
has become less popular. The hybrid F1 generation 
created by crossing two parental lines (P1 X P2) often 
exhibits desirable traits that exceed those of both par- 
ents, which is called hybrid vigor or heterosis. When 
farmers save seeds from a crop one year to plant the 
crop the next year, the crop planted with the saved 
seeds is the F2 generation. Unfortunately the F2 gen- 
eration is genetically unpredictable and has charac- 
teristically lower yield. Corn crops have been bred 
for thousands of years by picking and breeding the 
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plants with the most desirable traits. The practice of 
brown bagging corn in the United States is essentially 
unknown now because almost all modern corn crops 
are hybrids. Farmers buy hybrid corn seeds each year 
based on the results of field-testing the hybrid strains in 
their locations. There are test plots of hybrids and now 
GM crops planted across the United States. 

Initially the seed companies introduced male ste- 
rility traits into corn and other crops as an answer to 
the criticism that the transgenic DNA in the GM crops 
could be transmitted into the genomes of unmodified 
crops. Other approaches to conferring sterility involve 
changing the mitochondria or chloroplasts in the plant 
cells and are being tested in corn, soybeans, and other 
crops. Because mitochondria and chloroplasts are only 
inherited from the mother, the foreign transgene that 
confers sterility in these cases would not be present in 
the male sperm cells in pollen. 


The possibility that genetically engineered plants may cre- 
ate environmental or health risks is real, and techniques 
such as male sterility and transgenic chloroplasts or mito- 
chondria are being used to reduce the risk of contamina- 
tion by GM crops. 


BIOFUELS 


There are many serious problems associated with the 
use of fossil fuels including the high cost of oil and gaso- 
line and the risk of global warming, which triggered an 
international race to develop alternative sources of fuel 
to run cars, heat homes, and power factories. But the 
pressure to quickly solve this worldwide problem has 
already raised many dilemmas. A good example was 
the large political and public pressure to produce etha- 
nol (EtOH) from corn plants, arguably at the expense of 
the human food supply. Until the ethanol boom in the 
2000s, more than 60% of the corn harvested annually 
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in the United States was fed to domestic cattle, hogs, 
and chickens for human consumption in food or bev- 
erages. Many thousands of food items contain corn or 
corn byproducts and in Mexico, where corn is a staple 
food; the price of tortillas skyrocketed when corn grown 
in the United States and Mexico was diverted into 
ethanol production. Of course, the corn farmers were 
happy with this development, since the ethanol proces- 
sors bought more corn for much higher prices. As the 
demand for ethanol increased, so did the demand for 
corn. As a result, however, the cost of corn used for 
animal feed and as a source of human food both rose 
dramatically. By 2006 the price of corn had doubled. 
Consumers in the United States felt the upward pressure 
on food prices. In developing countries the high price 
of corn motivated many farmers to clear more land to 
grow additional corn crops by burning down the jungle 
and rain forests. 

In 2007, The President of the United States made it 
a national goal to reduce American gasoline consump- 
tion by 20 % in the next decade. The new mandatory 
fuel standard requires a capacity of 35 billion gallons 
of renewable and alternate fuels by 2017, a challeng- 
ing goal given that the current oil consumption in the 
United States is more than 70 billion barrels each year. 

Concerns about global warming due to greenhouse 
gas emissions due to fossil fuels combined with the 
high price and uncertain availability of oil from the 
Mideast have generated interest and investment in 
biofuels, sustainable fuels that come from plants. In 
2008, the United States consumed 12 million barrels 
of oil from fossil fuels. Considerable political pres- 
sure encouraged the federal government to focus on 
the production of ethanol derived from corn. To sup- 
port this effort, ethanol (15%) was added to gasoline 
as an alternate oxygenate to methyl tertiary-buty! ether 
(MBTE), which was widely used as an additive from 
the 1970s to the 1990s to reduce carbon monoxide air 
pollution. However, the EPA later concluded the MBTE 
is a potential human carcinogen at high doses, and 


Box 15.2 Green Dreams: Making Fuel from Crops Could Be Good for the Planet—After a Breakthrough or Two 


National Geographic, 2007 

http://ngm.nationalgeographic.com/2007/10/ 
biofuels/biofuels-text/2 

By Joel K. Bourne, Jr. 
In 2007 when Dario Franchitti drove his 670-horsepower, 
orange-and-black Indy car across the finish line at the 
Indianapolis 500, he became the first driver ever to win the 
Indy auto race on a fuel of pure ethanol. The change in the 
fuel used in this famous race is another indication of the rush 


to biofuels that took place in the mid-2000s, the shift toward 
homemade gasoline and substitutes for diesel fuel made from 
crops such as corn, soybeans, and sugarcane. 

Proponents of biofuels offer more than a solution for our 
dependence on Middle East oil, or an excellent way to curb 
carbon dioxide emissions. Unlike the carbon in fossil fuels, the 
carbon in biofuels comes from the atmosphere, it is captured 
by plants as they grow. In theory, some day burning a tank of 
ethanol could make driving an Indy car carbon neutral. 
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when MBTE was detected in water wells in a number 
of states as a result of leaking underground tanks at gas 
stations, it became essential to find an alternate oxy- 
genate for gasoline. 

The industry and the federal government turned to 
ethanol and more recently biological sources of diesel 
as alternative additives for fossil fuels. Diesel fuel, which 
is ignited by compression and not a spark like gasoline, 
was patented as a fuel in 1893. The shift in the indus- 
try to ethanol had a huge impact on corn farmers. Corn 
acreage rose from 70 million acres planted and 78 mil- 
lion acres harvested in 2006 to 83 million acres planted 
and 93.6 million acres harvested in 2007. 

Corn grain was chosen as the first source of ethanol 
for a number of reasons. First the process used to fer- 
ment the corn and distill the ethanol was well estab- 
lished. Also, despite the evidence that ethanol can 
damage engines, the automotive technology compa- 
nies and consumers have long accepted gasoline with 
15% ethanol, and gasoline containing 85% ethanol 
(E85) is available in gas stations. U.S. car manufactur- 
ers sell models that will run on E85, but these vehi- 
cles suffer a 15% drop in fuel economy. Research is 
also under way to identify different efficient sources of 
ethanol that might replace corn as the main source of 
ethanol for fuel, including cellulose stalks (from corn 
and other crop plants), rice hulls and bagasse, the sug- 
arcane stalks that remain after the sugar juice has been 
extracted (Figure 15.14). 


FIGURE 15.14 Sugarcane stalks—source of biofuel? The sugarcane 
stalks that remain after the sugar juice has been extracted are pos- 
sible sources of a new biofuel. 
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Biodiesel is a fuel can be derived from soybeans, 
canola, sunflower, soybean oil, palm oil, and other 
crops as well as used vegetable oil from restaurants. 
The interest in biodiesels grew in the United States 
because biodiesel offers the potential to reduce 
dependence on imported petroleum and to reduce the 
negative impacts of global climate change by lowering 
the net carbon dioxide emissions from transportation. 
Advocates of biodiesel have gained publicity and leg- 
islative tax incentives. New biodiesel fuels must meet 
EPA requirements and the standards of the American 
Society of Testing and Materials. How easily a die- 
sel fuel will ignite under pressure, is reflected by the 
cetane number of the fuel; a higher cetane number 
indicates the fuel ignites more quickly. The cetane 
number for petroleum based diesels ranges from 40 to 
42, and the biodiesel made from soybean has a cetane 
number of about 51. 


Ethanol made from corn quickly became a popular idea 
and a uniformly acceptable component of gasoline for 
automobile fuel in the United States. But ethanol from corn 
has disadvantages that have inspired scientists to look for 
alternatives to ethanol from corn. 


Three factors are considered when deciding how 
beneficial different alternative fuel sources are to the 
environment: 


e The net energy balance (NEB) between the two 
fuels (diesel compared to biodiesel for example) 

e The levels of greenhouse gas emissions (GHG) 

e The production of useful co-products (such as glyc- 
erol made from soybeans) 


These factors are used to compare the costs and bene- 
fits of developing different alternate fuels: ethanol, bio- 
fuel, and biodiesel. To calculate the net energy balance 
ratio, the energy output is subtracted from the energy 
input needed to produce the fuel, which is a measure 
of whether or not the fuel requires more energy to pro- 
duce than it yields. Production of soy biodiesel would 
cause a decrease in the release of the pollutants such 
as phosphorus and pesticides. Currently, the use of 
corn grain ethanol results in 87% of the greenhouse 
gases released, compared to fossil fuels that gener- 
ate equivalent amounts of energy; soybean biodiesel 
production and combustion release only 59% of the 
greenhouse gases. 

In 2005, the production of corn grain ethanol 
accounted for 1.7% of gasoline in the United States 
and soy derived biodiesel accounted for 0.9% of the 
diesel usage. It was estimated that if all of the corn and 


346 


(A) 


DNA and Biotechnology 


FIGURE 15.15 Fungus as a source of biofuels. (A) Microscope image of T. reesei cells with hyphae extensions containing vesicle mem- 
branes (red) and chitin that makes up the cell wall (b/ue); (B) Studies on fungi focus on reducing the high cost of converting lignocellulose 
(shown here) into fermentable sugars that can be used to generate biofuels. 


soybean production in 2005 were used to make biofu- 
els, it would provide only 12% of the gasoline usage 
and 6% of the diesel used in the United States. If we 
depend on current technologies, it is not possible for 
the United States to sufficiently decrease dependence 
on foreign oil or to reduce greenhouse gas emissions 
enough to make a difference in climate warming. In 
addition, there is a risk of food shortages and negative 
environmental consequences. With the rising demand 
for biofuels made from food crops, the price of food 
will also continue to increase. 

Other sources of ethanol in addition to or instead 
of corn remain experimental. The Department of 
Energy has committed $385 million to six companies 
to help support research on developing the transgenic 
plants and associated technologies needed to produce 
ethanol from cellulose and lignocellulose. During the 
process of biofuel production, lignin physically blocks 
the access of enzymes that are needed to break down 
the polysaccharides in the cellulose. To solve this prob- 
lem, scientists are altering the genes involved in lignin 
production. The first strategy involved inserting a trans- 
gene into plants to reduce the lignin amounts, but the 
transgenic corn and sorghum plants in this experiment 
grew very slowly. In another approach researchers 
engineered transgenic corn with genes that produce 
bacterial enzymes to break down the cellulose at 
times in the life cycle of the plants that avoid negative 
effects on growth. So far, high expression levels of this 
transgene have not yet been achieved. The productivity 
and net energy balance of this new biofuel will not be 
known until the technologies are tested in both pilot 
plants as well as in manufacturing facilities. 


Studies are underway to identify new potential 
sources of biofuels using strains of fungus such as T. 
reesei (Figure 15.15). This research focuses on reducing 
the prohibitively high cost of converting lignocellulose 
to fermentable sugars to produce the next generation 
of biofuels. The plan is to use fungi to generate indus- 
trial enzyme “cocktails”, combinations of enzyme pro- 
teins that will enable the more economical conversion 
of biomass into biofuel. Different feedstock includes 
the perennial grasses Miscanthus and switch grass, 
wood from fast-growing trees like popular, agricultural 
crop residues, and municipal waste. 

The long range goal of this research is to replace 
the gasoline-dependent transportation sector of the 
United States economy with a carbon-neutral source 
of fuel from plants or microorganisms. 

The use of algae to produce biofuels received recent 
support from the Department of Energy, which had 
previously funded research on renewable transporta- 
tion fuels from algae (1978 to 1996). Numerous algae 
strains were collected and characterized and a number 
of algae genes involved in oil production were identi- 
fied and cloned (Figure 15.16). At the time the program 
was canceled, the biofuel produced from algae would 
have been able to compete against the price of oil at 
that time. Of course the recent surge in oil prices and 
recent advances in biotechnology have refueled the 
race to develop commercially competitive and afford- 
able algae biofuels. In 2007, the Chevron corporation 
entered into a research and development (R&D) agree- 
ment with the Department of Energy designed to iden- 
tify algal strains and biochemical processes that can be 
used to produce algae-based biodiesel fuel. Whereas 
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FIGURE 15.16 Algae are a potentially rich source of future biofuel. (A) The LiveFuels Alliance is tapping into the oil-producing potential of 
algae with an ambitious initiative to replace millions of gallons of fossil fuels with algae-based biofuel by 2010; (B) Algae cells growing in 


liquid culture. 


several companies like Algae BioFuels, Greenfuels, 
and Solix Biofuels are independently working toward 
this goal, the LiveFuels Alliance differs in that it is a 
collaboration led by Sandia National Laboratories, and 
the U.S. Department of Energy National Laboratory 
and will sponsor dozens of research programs and 
hundreds of scientists. This is the largest national effort 
focused on producing commercial biocrude fuels. 

The scientists at the LiveFuels Company plan to 
refine the processes needed to increase algae oil pro- 
duction at competitive prices and focus on the spe- 
cialized aspects of the process involving the algae 
cells, including breeding algae to find the best high- 
fat strains, refining the fat and oil extraction process, 
and developing cost-effective harvesting techniques. 
Extracting oil from algae cells is basically the same as 
any other biofuel extraction technology, but cultivat- 
ing the algae has other advantages. The algae cells do 
not require prime agricultural land and the potential 
biofuel yield from algae far exceeds other renewable 
sources. Algae are predicted to yield between 1000 
to 20,000 gallons of oil per acre, depending on the 
specific algae strain, which is significantly more oil 
per acre than soy, which produces about 117 gallons 
of oil per acre. The potential yield from 20 million to 
40 million acres of agriculturally marginal land in the 
United States could produce enough algae to replace 
imported oil and preserve 450 million acres of current 
farmland for use growing food crops. Under the right 
conditions, algae cells divide rapidly in fresh, brack- 
ish, or even wastewater. The amazing algae cells are 
nontoxic, biodegradable, and have the potential to 
supply a large fraction of the domestically grown bio- 
fuels available in the future. 


Green algae are good source of a new biofuel as they can 
be grown in large cultures and do not compete with ani- 
mals or people as a food source. Scientists predict that in 
the future algae-based biofuel will replace millions of gal- 
lons of fossil fuels used in the United States. 


DEVELOPING COUNTRIES 


The use of transgenic crops in developing countries 
might not make much sense, particularly since most 
farms are small and must support a number of crops 
without the aid of pesticides or fertilizer. Land in tropi- 
cal areas is often nutrient poor, acidic, and has high 
aluminum and low phosphorous content. Drought is 
a problem in many developing countries where farm- 
ers depend on rain because there is no widespread 
irrigation. Although GM crops can offer marginal 
improvement in yield, which might be substantial for 
farmers with small farms in developing countries, the 
economic and infrastructure resources that make GM 
crops attractive to large-scale farms are generally not 
available in developing countries. The international 
seed companies have been slow to respond to the 
needs of farmers in developing countries. 

The 2001 Human Development Report from the 
United Nations supported the use of transgenic crops 
by farmers in developing nations and urged the nations 
to identify and develop the transgenic crops that are 
well suited to growing conditions in their local regions 
such as drought and nutrient-poor soil. The World 
Bank reviewed the Consultative Group on International 
Agriculture Research (CGIAR), which supports and 
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connects the scientists at 15 international research 
centers, and concluded that the CGIAR group had not 
responded sufficiently to the biotechnology revolution, 
underscoring the continuing problems in developing 
countries. Just as issues of intellectual property (pat- 
ents) and profit have slowed the use of medicines and 
vaccines to treat people in the developing world, these 
problems have slowed the development of transgenic 
crops suited for use in developing countries. Recently, 
more attention is being paid to diseases of developing 
countries with the efforts to publicize the astounding 
advances that agricultural biotechnology has made 
that should be available to benefit the world’s poor. 


SUSTAINABLE AGRICULTURE 


Sustainable agriculture is a system of farming that 
takes into account the environmental, social, and eco- 
nomic issues relating to the growth and distribution of 
crops. Modern, large-scale (factory) farms that usually 
grow one (monoculture) or just a few crops and rely 
completely on the use of fertilizers, pesticides, and 
herbicides are not conducting sustainable agriculture, 
even if they use genetically modified plants. Yet these 
farming practices have successfully provided people in 
the developed world with abundant, inexpensive food. 

Sustainable agriculture is often used synonymously 
with organic farming and organic food, but this is not 
accurate. It does not take much experience food shop- 
ping to realize that organic products are significantly 
more expensive than non-organic food. This is because 
organic products are raised, grown, or processed with- 
out the use of synthetic hormones, pesticides, or fer- 
tilizers, which is overall a less efficient process and 
therefore more expensive. But sustainable agriculture 
is not an option if humans are to become better stew- 
ards of the earth. Scientists have a responsibility to 
continue to focus on developing the best tools, mod- 
ern or traditional, that will effectively protect the secu- 
rity of the world’s food supply, the environment of the 
entire planet, and provide for the well-being of peo- 
ple in resource-rich and resource poor lands. Biofuels 
play a large role in achieving these goals. Success will 
require collaborations between developed and under- 
developed countries, involving governments, univer- 
sities, and representatives of the companies and the 
public sector. 


Future success in the equitable use of the Earth’s resources 
will require people to better balance the use of modern 
biotechnology and increased productivity with ethical con- 
cerns about environmental impact, profitability, and social 
equality worldwide. 


DNA and Biotechnology 


SUMMARY 


This chapter covered ways that advances in the con- 
cepts, tools, and methods of molecular biology and 
recombinant DNA technology have influenced the 
rapidly growing field of agricultural biotechnology. 
Researchers routinely identify, isolate, and character- 
ize genes and their protein products from microorgan- 
isms and plants as well as animals and humans. DNA 
cloning methods have been developed to insert genes 
into DNA vectors used to introduce new genes into the 
genomes of plants to generate transgenic plants includ- 
ing corn, soybean, sugar beets, cotton, canola, squash, 
and many others. Transgenic plants are designed to 
improve the crops used in farming because the trans- 
genic plants are engineered to be resistant to infections 
by some viruses, insects, and bacteria and can grow 
well in the presence of herbicides. Scientists have 
developed different ways to introduce the vector DNA 
carrying a transgene into the plant genome including 
transfection by an Agrobacterium plasmid and physi- 
cal methods such as a gene gun that shoots DNA pro- 
jectiles into the plant cells. 

The introduction of genetically modified plants into 
the environment raised concerns from some environ- 
mentalists about the unknown impact of transgenic 
plants on the environment and human, animal, and 
plant health. The USDA, EPA, and the FDA agreed 
on rules and procedures to reduce the potential risks 
posed by transgenic plants. Despite the continuing 
controversy in both the United States and in Europe, 
the use of GM plants has increased in the United 
States and in other developed countries. Researchers 
have genetically engineered plants to produce proteins 
for medical use in humans and animals, and several 
are now in Clinical trials to assess safety and effective- 
ness. Oral vaccines have been developed using trans- 
genic potatoes and bananas that can be eaten instead 
of injected. A vaccine against the bacteria that cause 
tooth decay is being developed in tobacco plants. 

The use of transgenic plants to produce protein 
drugs for use in humans intensified public concerns 
about the accidental exposure to transgenic plants, 
especially if the genetically altered plant is a food or if 
the GM pollen could unintentionally spread the altered 
genomes to human food crops. Genetically altered 
chloroplasts may be a preferable option because chlo- 
roplasts are only inherited through the maternal germ 
line. It is not possible for a chloroplast transgene to be 
transmitted through the male sperm cells in the pol- 
len. Like human and animal genes, plant genes and 
associated diseases are linked to known RFLP and SNP 
genetic markers in plant chromosomes that can be vis- 
ualized through methods involving PCR amplification. 

The sudden increase in gas prices in 2008, the 
shortage of fossil fuels, and the critical issue of global 
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warming have led to new investment in plant-source 
fuel alternatives. Ethanol derived from corn grain and 
biodiesel derived from soy and other plants are cur- 
rently available to the consumer, but this biofuel has 
put additional pressure on the human food supply, 
especially corn-based products. However, possible 
alternatives such as ethanol derived from corn stalks 
instead of corn and oil made from algae, are mostly 
experimental at this time. 

The United Nations has urged the international 
community of scientists to participate in the transfer 
of appropriate agricultural biotechnology to develop- 
ing countries. The use of transgenic crops and other 
aspects of agricultural biotechnology in developing 
countries will improve the yield of subsistence farms, 
but to develop an agricultural industry that will sub- 
stantially improve the quality of life will require the 
collaboration of governments, companies, academics, 
private-sector stakeholders, and people who are often 
involved in tribal warfare and suffering from wide- 
spread diseases. 


REVIEW 


To test your knowledge of the concepts in this chapter, 
answer the following questions: 


1. What methods are used to deliver DNA into plant 
cells to create transgenic plants? 

2. Why are plant cells more difficult to transform 
with DNA than animal cells? 

3. How do the Cry proteins kill insects? 

4. What federal government agencies must approve 
plans to plant a transgenic crop in an open field? 

5. What are the concerns expressed by people 
opposed to use of the genetically modified crops? 

6. How are DNA microarrays used to detect the 
pathogen responsible for an infected plant? 

7. Explain the basic mechanism behind PCR, and 
describe the basic characteristics of the final PCR 
products after many PCR cycles. 

8. What special concerns focus on the transgenic 
plants that produce protein drugs for human use? 

9. What are the limitations of ethanol made from 
corn as a fuel to replace fossil fuels and reduce 
greenhouse gases? 

10. How do scientists explain the mechanism of 
genetically engineered male sterility, and why are 
some people opposed to the concept of terminator 
technology? 
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Eliminating Racial and Ethnic Health Disparities 


Centers for Disease Control 

African-American, American Indian, and Puerto Rican 
infants have higher death rates than white infants. In 2000, 
the black-to-white ratio in infant mortality was 2.5 (up from 
2.4 in 1998). This widening disparity between black and 
white infants is a trend that has persisted over the last two 
decades. 


African-American (black) children are two and a half 
times as likely to die as infants than Caucasian (white) 
children in the United States. Could this disparity 
possibly be caused by a difference in genetics due to 
race? Some people argue that the differences in health 
statistics result from the different genetics that define 
different racial populations. Others maintain that this 
idea is nonsense, and attribute the differences in heath 
statistics not to genetics, but to the environmental, 
social, and economic conditions under which many 
black people live in the United States. 

This chapter will examine the scientific evidence 
available to determine whether race is a genetic or bio- 
logical concept. In cases where there are clear differences 
in health issues between black and white populations, 
can these differences be explained by human genetics? 


LOOKING AHEAD 


In this chapter, we will examine the idea that different 
human races can be defined in terms of genetics or 
biology. On completing the chapter, you should be able 
to do the following: 


e Explain why it is problematic to define race in terms 
of human genetics. 

e Describe how modern scholars have revised the 
depiction of race in ancient texts. 

e Name some human diseases that have been labeled 
as ‘racial’ or ‘ethnic’ diseases, and explain whether 
or not it is accurate to use these labels to describe 
these diseases. 
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e Describe whether there is more genetic variation 
occurring within groups called races or between 
these groups. 

e Explain how the geographical origins of human 
populations are represented by differences in 
human genome DNA sequences. 

e Explain some of the complexities involved in estab- 
lishing accurate relationships between intelligence, 
IQ and race. 


INTRODUCTION 


“We are living through an era of the ascendance of biology, 
and we have to be very careful,” said Henry Louis Gates 
Jr., director of the W. E. B. Du Bois Institute for African and 
African American Research at Harvard University. “We will 
all be walking a fine line between using biology and allowing 

it to be abused.” 
(In DNA Era, New Worries About Prejudice, Amy Harmon 
2007, New York Times) 


Tremendous advances in molecular biology and genet- 
ics, including the complete human genome DNA 
sequence, have recently shed additional light on 
the subject of race and have greatly illuminated the 
immensely complex issues involved. The goal here 
is to understand what is meant in biological terms by 
“race” and to understand the implications of ‘race’ on 
advances in medicine and biotechnology. For exam- 
ple, are there important biological differences between 
people that can be ascribed totally to race? Should a 
patient's race be considered when prescribing drugs 
to treat diseases? Is there any situation in which the 
consideration of race is necessary for the delivery of 
appropriate medical care? These types of questions 
may be difficult to discuss but race discrimination in 
the United States has had a tremendous impact on 
medicine and healthcare. In this chapter we address 
race from a biological standpoint to better understand 
the connections between race, healthcare, and medi- 
cine and to help promote equal access to the amazing 
advances in biotechnology and medicine that are gen- 
erated by research in modern genome science. 

In 1972 Richard Lewontin, a leading evolutionary 
geneticist, published a study comparing the amino 
acid sequences of different human proteins (Figure 
16.1). Lewontin found that at least 85% of the genetic 
differences between human individuals have nothing 
to do with race, as this genetic variation is present 
even between individuals of the same race. Lewontin’s 
results on protein sequence variation were later con- 
firmed by research on DNA genomes, including the 
human genome (see Chapter 6). In 2001 when the 
first DNA sequence of the entire human genome was 
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Richard C. Lewontin. One of the most brilliant evolu- 


FIGURE 16.1 
tionary biologists of his time, Lewontin is also a leading critic of the 
misuse of scientific discoveries. He exposes common misconcep- 
tions that interfere with the ability of people to understand biology 
and evolution. Lewontin’s work contributed to our understanding of 
the functional connection between genes and the environment and 
the role of genes and DNA in evolution. 


released, scientists initially concluded that humans 
are remarkably similar to each other, and that the 
DNA genomes of any two people are at least 99% 
identical. Since that time, researchers have updated 
the sequence of the 3.2 billion base pairs in the 
human genome. They found that the difference in 
DNA sequences among individual human genomes is 
99.5%, much greater than previously thought (Figure 
16.2A). About 10 million single nucleotide polymor- 
phisms (SNPs) (DNA differences) have been identi- 
fied in human genomes and are used as extremely 
important tools in many applications of DNA technol- 
ogy (see Chapters 6, 8, 10). The sequence variations 
between human genomes are used in many appli- 
cations such as DNA forensics to identify potential 
criminals (including innocent people behind bars), 
paternity testing to identify biological family relation- 
ships, DNA used as a ‘barcode’ to track poached meat 
from endangered species, and in medical research to 
study the genes associated with human diseases. A 
single base pair mutation in a human gene can cause 
a life threatening disease such as sickle cell anemia. 
The discussion of genes and race in this chapter will 
include three parts. First we will explore the history of 
race to find out if race has always been a concept in 
human society. Second we will examine the genetic 
definition of ‘race’ and look for biological evidence to 
support the concept of racial diseases. The current and 
future research on human genomes from people from 
all over the world has revealed important information 
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FIGURE 16.2 Genetic variation: Small sequence differences between human genomes. (A) Comparison of the sequences of different human 
genomes shows that most of the sequences are identical. There are also about 10 million sites in the genome that differ at a particular base; 
one person's genome might have a “C” at that position while another person might have a “T” in that position. These single base differences 
are called SNPs (snips) and are distributed across the genome sequence. (B) About 10% of the small genome sequence differences (SNPs) 
are distributed unevenly across the genome DNA and are correlated with different traits. (a) Most SNPs such as the ones shown in differ- 
ent shades here occur at about the same frequency in people from Africa, Europe and Asia (genetic variation is shown in different shades). 
(b) The pale skin color common in Europeans is caused by an SNP that is never found in populations in Africa or Asia (people with this SNP 
are shown in a lighter shade). (c) A DNA change that is found exclusively in East Asian populations correlates with a reduced ability to sweat 


and includes almost all people of Chinese ancestry (darker shades). 


about the frequency of genetic variation among the 
individual human genomes (see Figure 16.28). 

In the last section of this chapter we will discuss the 
controversies surrounding the impact of race on genes, 
IQ, healthcare, and lifestyle and discuss the genetic 
explanations for the apparent differences between 
African-Americans (black people) and European 
Americans (white people). 


THE HISTORY OF RACE 


Race is a Modern Concept Created 
by Humans 


During the eighteenth and nineteenth centuries and 
much of the twentieth, the notion of “race” was pre- 
sented as a concept that had always been part of human 
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culture. At the time, people tried to justify the idea that 
black people are fundamentally inferior, some scholars 
turned to the writings of ancient cultures and used the 
texts of the ancient Greeks and Romans as examples of 
the early origins of race as a human concept. Franz Boas 
(1858-1942) was a pioneering anthropologist who was 
among the first to call into question the traditional notion 
of race at the time. He initially encountered resistance 
within his own scientific community, but later those sci- 
entists were among the first to denounce the use of race 
as a way to define and classify human differences. 

In the late twentieth century, scholars began to 
re-examine the ancient texts more closely. Frank 
Snowden and Ivan Hannaford, (Woodrow Wilson 
International Center for Scholars) re-visited the ori- 
gins of race. Hannaford’s work places the beginning of 
racial theory in 1684 with the publication of a book 
by Francois Bernier, which divided people according 
to their observable characteristics in terms of race and 
ethnicity. For much of the seventeenth, eighteenth, 
and even into the twentieth century, people believed 
that race had been used by humans to classify humans 
since the start of civilization. Part of the ideology 
behind the concept of different races included the 
idea that the different human races evolved separately. 
Whereas blacks evolved in Africa, it was said, whites 
evolved separately in the Aryan Plain. This was the 
origin of the name Aryans for whites, a term adopted 
by the Nazis. In the twenty first century we have clear 
evidence from genomic studies and archaeological 
research showing that all modern humans evolved 
at one time in Africa, about 200,000 years ago (see 
Chapter 8). 


The concept of “race” is a relatively recent invention of 
society and was not always part of human culture. The 
constant characteristics attributed to different races led to 
the incorrect idea that different races of humans evolved 
separately. 


THE GENETICS OF PHYSICAL 
CHARACTERISTICS 


If we define “race” using biological traits such as skin 
color, facial characteristics, height, weight, then in 
that specific context, race is genetically determined. 
These physical traits, or phenotypes, are determined by 
the particular alleles of genes that are inherited by the 
individual. Alleles are different versions of the same 
gene, such as a wild type (normal) allele and a mutant 
(altered) allele of the same gene. 

The term “race” is sometimes used interchange- 
ably with terms such as “variety” (as in plants) and 
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“subspecies.” However, biologists categorize a plant 
or animal group as part of a separate race (subspe- 
cies, or variety), if it satisfies one of the following two 
criteria. 

First, the group must have its own separate genetic 
lineage, which means that individuals never, or rarely, 
would mate outside the group. Clearly, the human 
groups currently called races do not satisfy this criterion, 
and according to historical and archaeological evidence 
they never have. 

Second, the genetic differences between one group 
and another group would have to be significantly 
greater than the genetic variation within members of 
either group. Human races do not fulfill this require- 
ment either. In fact modern genome studies have proven 
that there is actually more genetic variation (DNA differ- 
ences) within races than exists between people from dif- 
ferent races (see Figure 16.2). For example, the genetic 
variation among Africans, including African Americans, 
is greater than the genetic variation in genomes from 
the rest of the human population. 

Genetic studies show that different people will some- 
times have alleles in common if they have lived in the 
same geographic region for many generations. Studies 
also show that there is no single gene that is found 
in one race but is not found in a different race, once 
again questioning the genetic basis of the term, race. As 
expected, researchers did identify differences in allelic 
frequency among the genomes of people belonging to 
different races. The differences between these groups 
represent from 4% to10% of the total amount of genetic 
variation in the human genome, but do not shed light 
on the biological concept of race. Scientists might never 
agree on a clear genetic definition of ‘race’ because so 
far there are no clear genetic differences that distinguish 
between different ‘human races’. 


Race is Not a Genetic Concept 


To date (2009) there is no reproducible evidence from 
genome research that supports the existence of sepa- 
rate races (subspecies) of modern humans. Different 
alleles have been found that are responsible for phys- 
ical traits such as pale skin and hair color, but there 
are no consistent patterns of human genes or alleles 
that can distinguish humans in one race from those in 
another. Scientists have found no evidence to support 
the classification of humans based on ethnicity. In fact, 
genome studies have proven that there is actually more 
genetic variation (DNA differences) within ‘races’ than 
exists between people of different races. 

Most scientists agree that evolution of the modern 
human (homo sapiens) (Figure 16.4) involved waves 
of migration of the first hominids (homo erectus) out 
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FIGURE 16.3 Small differences in the DNA genome of an animal can cause dramatic differences in physical characteristics such as 
fur color. (A) This jaguar has the familiar coat color that is common to most of the South American jaguar population. (B) This dark-colored 
jaguar represents about 6% of the South American population and is caused by a polymorphism in the jaguar genome. 
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FIGURE 16.4 Homo sapiens : the modern human. Most scientists 
agree that the modern human (homo sapiens) evolved from homo 
erectus, a human-like primate who walked upright on two feet (see 
Chapter 8). 


of Africa and into Europe and Asia about 1 million to 
2 million years ago (1.0-2.0 mya) and a second wave 
of migration followed about 100,000 to 200,000 years 
ago (see Chapter 8). The theories proposed to explain 
the evolutionary origin of modern human homo sapi- 
ens predict different types of encounters between the 
most recent African migrants and other hominid groups. 
Some evidence indicates that the groups co-existed 
with possible interbreeding between the recent African 
migrants and other hominids. Scientists are using DNA 
analysis to determine if the human homo sapiens 
genome is derived solely from the DNA of the most 
recent wave of African migrants and to find out whether 
the DNA record includes evidence of interbreeding with 
the other hominids along the route to Europe and Asia. 
The evolutionary pressures from natural selec- 
tion and genetic drift (random fluctuations in the fre- 
quencies of DNA sequence variants) act on genetic 
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FIGURE 16.5 Genetic variation and the founder effect. The migra- 
tion of hominids from Africa to Europe involved a small number of 
individuals. As a result the level of DNA sequence variation present 
in the chromosomes of those who migrated was much less than the 
sequence variation present in the genomes of the individuals who 
stayed behind, which is called the founder effect. (Genetic variation is 
indicated by the red and blue shapes). 


variation but do not cause the actual changes in the 
DNA sequence. In the natural environment outside the 
lab, the genetic variation in genome DNA sequences 
are caused by mistakes in DNA replication (see Figure 
16.2) and by DNA damage due to environmental 
toxins and radiation like the UV rays in sunlight (see 
Chapter 9). The genetic variation present across the 
current population of human genomes reflects the dif- 
ferences among the genomes of the human ancestors. 
Because a comparatively small number of humans 
made the migratory trek out of Africa, the total amount 
of DNA variation present in their collective DNA pool 
was much less than that of the rest of the homo sapi- 
ens who stayed behind, known as the “founder” effect 
(Figure 16.5). The human concept of race once again 
fails to meet the previously stated criteria used to iden- 
tify a species or a subspecies. 
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The socially defined concept of race meets neither of the 
criteria required to establish a genetically defined species 
and subspecies. There are no alleles or genes present in 
one ‘race’ and not in another ‘race’. There is no evidence 
to support a biological concept of human races that jus- 
tify prejudice and discrimination based on the artificial 
concept of race. 


Geographical Origins Reflected in DNA 
Differences 


Could the different human races have evolved during 
the hominid migration out of Africa? This is possible, 
because we know from evolution that one species 
evolves from other species. Clearly evolution occurred 
over this time period; after all, people living in Norway 
do not look like Africans. On the other hand, having 
ancestors from one geographical area as opposed to 
another does not mean that these individuals will have 
fundamental genetic differences. 

To understand what is necessary for species (or 
races) to evolve, consider the basic steps during the 
evolution of a new species. Two animals are said to 
be of the same species if they can mate and produce 
fertile offspring. However, a mating that produces live 
but sterile offspring indicates that the parents belong 
to species that are evolutionarily very closely related 
but are two separate species. In order for a new spe- 
cies to evolve as a distinct species, the species must 
be physically (geographically) isolated for enough time 
for genetic changes to occur that prevents the two spe- 
cies from successful breeding. Darwin’s finches are two 
species of birds living on the Galapagos Islands, which 
diverged into two species because they were geograph- 
ically separated into two different environments over 
many thousands of years, with no crossover of birds 
from one group to the other. The idea that geographical 
separation could cause different species to evolve was 
one of Darwin’s great insights. 

For a subspecies or race of humans to evolve, the 
two human populations would need to be geographi- 
cally (physically) separated for a time period much 
longer than it took for humans to migrate out of Africa. 
Physical separation of the migrating populations was 
also unlikely. Genetic and archaeological data support 
a gradual migration, with constant interactions among 
the different villages as the groups of people migrated 
along the same or parallel paths. Research is under- 
way to sequence human genomes from all over the 
world, which will substantially increase the genome 
sequence information available for comparison analy- 
ses (see Chapter 6). Researchers have searched the 
human genome sequence databases but were unable 
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to find any evidence to support the evolution of sepa- 
rate human species or races. 


Races (species) evolve over an extended period of physical 
separation and many thousands of years. It is very unlikely 
that there was geographical separation for different human 
races to evolve. 


Blood Type Genes Used To Classify 
Humans 


In 1900 and 1901, Karl Landsteiner showed that 
humans have only three blood type alleles A, B, and O 
(Table 16.1), which he classified according to the types 
of antigens on the surfaces of red blood cells and the 
kinds of antibodies that bind to those antigens (Figure 
16.6). For this work, Landsteiner won the 1930 Nobel 
Prize in physiology or medicine. Scientists now know 
that there are a number of other, rare, antigens that 
complicate actual blood typing, but none of the meth- 
ods found a correlation between blood type and race. 

Landsteiner’s research was one of the first to show 
that different human biochemical characteristics are 
not divided along racial lines. This work was used as a 
powerful weapon to argue against racial prejudice and 
segregation, and provided important evidence to sup- 
port the idea that heritable traits are not be related to 
the human concept of race. 

Two protein antigens, A and B, occur on the sur- 
face of red blood cells (see Figure 16.6). An antigen is 
usually a protein that when introduced into the body 
can stimulate the production of an antibody response. 
This means that antibodies are made in response to a 
specific antigen. Antigens can be toxins, bacteria, pol- 
len, animal dander, and foreign blood cells. An anti- 
body is a protein complex made by the immune cells 
that are secreted into the blood or lymph in response 
to an antigen. This is one way that the immune system 
protects the body from disease. Each antibody recog- 
nizes a specific antigen and binds extremely tightly to 
the antigen. This is part of the mechanism allowing the 
body to recognize its own cells and tissues as “self”. 

Most everyone knows that blood type is important 
in case you need a blood transfusion at a hospital, but 
few people know why. Using blood with the wrong 
blood type can be fatal, which is why healthcare 
workers carefully cross-check the patient's blood type 
and the blood products before treating the patient. The 
A and B protein antigens on the surface of the blood 
cells interact with antibodies A and B. If treated with 
the wrong blood type, antibody A binds to antigen A, 
causing the blood cells to agglutinate (clump together). 
This is dangerous because the agglutinated blood cells 
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TABLE 16.1 Human blood type groups 


Phenotype Genotype 
A AA or AO 
B BB or BO 
AB AB 

O OO 


Group A Group B Group AB Group O 
Red blood 
cell type 
ee il 
AA | A N ANA 
Antibodies 
4N- 4N- 4y- 4y- 
present \ | ae ia il 
Anti-B Anti-A None Anti-A and Anti-B 
Antigens ? bg ?? 
present A antigen B antigen A and B None 
antigens 


FIGURE 16.6 Blood type (or blood group) is determined by the 
ABO blood group antigens on the surfaces of red blood cells. The 
four different human blood groups involve the following compo- 
nents: Blood group A has A antigens and B antibodies. Blood group 
B has B antigens and A antibodies. Blood group O has no antigens 
but has both A and B antibodies. Blood group AB has both A and B 
antigens, and no A or B antibodies. 


leak and are toxic (see Figure 16.6). The genetic inher- 
itance patterns of the A, B, O alleles that determine the 
blood groups are shown in Table 16.2. 

The four different human blood groups A, B, and O 
have the following components: 


e Blood group A has A antigens and B antibodies. 

e Blood group B has B antigens and A antibodies. 

e Blood group O has no antigens but has both A and 
B antibodies. 

e Blood group AB has both A and B antigens, and no 
A or B antibodies. 


The Punnett square method can be used to predict all 
the possible combinations of the maternal and pater- 
nal alleles for each gene studied. The Punnett square 
diagram is used to organize the parental alleles in a 
genetic cross so that they can be used in a logical way 
to predict the genes inherited by the potential offspring 
of that particular mating. The Punnett square provides 
a summary of every possible combination of maternal 
and paternal alleles of each gene involved in the cross. 
In the case of human blood types, the parental alle- 
les (ABO) and (ABO) are placed in the top row and 


357 
TABLE 16.2 Blood type alleles predicted from parental 
alleles (ABO) x (ABO) 
Parent Alleles A B O 
A AA (A) AB (AB) AO (A) 
B AB (AB) BB (B) BO (B) 
Q AO (A) BO (B) OO (O) 


the left column of the grid with the genotype shown in 
regular font, and the phenotype indicated in bold ital- 
ics and parentheses (Table 16.2). 

The blood typing process depends on genetics and 
biochemistry, with little or no contribution from envi- 
ronmental factors. However, the relative frequencies 
of the blood type alleles and the distribution of the 
different blood types show considerable variation 
in different populations and geographical locations 
(Figure 16.7) (Table 16.3). 

Due to the extensive mixing of human populations 
over time, no single blood type currently predominates 
in the human population, including blacks and whites 
in the United States. The evolutionary significance of 
the geographical distributions of human blood types is 
not yet known, although some preliminary results sug- 
gest that blood type O might make a person slightly 
less susceptible to bubonic plague. 

When scientists and others attempt to define dif- 
ferent populations, either culturally, geographically, or 
both, within these populations they find many genetic 
variations, with an uneven distribution of genetic vari- 
ations throughout different populations, as is observed 
for blood type. 


Most human populations have all the possible blood types, 
with no one blood type predominating. This is partly due 
to the extensive mixing of human populations that helped 
to distribute the blood types worldwide. 


CONTROVERSIES IN HEALTH, MEDICINE, 
AND THE IQ TEST 


There are well-known disparities in health and medical 
care among the designated human races in the United 
States (Table 16.4). That African Americans experience 
illness more often than European Americans and that 
blacks in the United States have a shorter life span are 
scientific facts. But the reasons for these differences 
are under dispute and require additional information. 
What are the relevant scientific facts? 
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FIGURE 16.7 Worldwide distributions of human blood type groups. 


TABLE 16.3 Human blood type distribution varies in different populations and geographical 


locations (%) 


Population (0) A B AB 
Aborigines 61 39 0 0 
Arabs 34 31 29 6 
Bororo (South American Indian) 100 0 0 0 
Eskimos (Alaska) 38 44 13 5 
Eskimos (Greenland) 54 36 23 8 
Jews (Germany) 42 41 12 5 
Jews (Poland) 33 41 18 8 
Kikuyu (Kenya) 60 19 20 1 
United States (blacks) 49 27 20 4 
United States (whites) 45 40 11 4 


Three factors affecting cancer risk were found to 
impact mortality in the United States African American 
population: (1) blacks are also more likely to develop 
cancer, (2) blacks often wait longer than whites to get 
medical care, and (3) blacks have lower survival rates 
after cancer diagnosis, compared to whites. To what 
extent are these factors responsible for the differences 
in cancer deaths between blacks and whites and is 
there a genetic basis for these differences? 

A new study by the University of California, Los 
Angeles (UCLA) published in the Journal of General 
Internal Medicine (2009), found that for most types of 
cancer, the difference in mortality is almost entirely due 
to the fact that African Americans are more likely to 
get cancer in the first place. The stage at diagnosis and 
survival time played a much smaller role in mortality. 


This was the first time that research clarified the role 
that these differences play in increasing cancer or 
decreasing the life expectancy of African Americans. 

Breast cancer is a notable exception identified by 
this study. Whereas white women were more likely 
to get breast cancer than African American women, 
the difference in mortality between white and black 
women was mostly due to the gap in breast cancer 
screening and treatment. This study was significant 
because it involved many people; the researchers ana- 
lyzed data sets from the Surveillance and Epidemiology 
End Result (SEER) cancer registry and the National 
Health Interview Survey (NHIS) involving about 
2.7 million white and 291,000 African American 
cancer patients from 12 geographic regions in the 
United States. 
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TABLE 16.4 Health status measures in racial/ethnic groups in the United States, 1998 


White Black Hispanic Asian 

Cause of death Age-adjusted death rates* 

All causes 450.4 690.9 432.8 264.6 
Heart disease 121.9 183.3 84.2 67.4 
Coronary disease 79.2 92.5 54.7 42.9 
Stroke 23.3 41.4 19.0 22.7 
Cancer 121.0 161.2 76.1 74.8 
Coronary obstructive pulmonary disease 21.9 ay 8.5 7.4 
Pneumonia/influenza 12.7 17.4 9.8 10.3 
Liver disease 7.1 8.0 11.7 2.4 
Diabetes mellitus 12.0 28.8 18.4 8.7 
HIV infection 2.6 20.6 6.2 0.8 
External causes 46.7 68.8 44.7 24.4 
Infant mortality (*) 6.0 13.6 5.8 5.5 
Life expectancy (yrs) 77.3 71.3 >80? >80? 


The researchers concluded that continuing to 
improve cancer treatment and screening methods are 
undoubtedly important to improve the life expectancy 
for all adults, but substantial disparities in cancer mor- 
tality will probably persist unless we can find ways to 
address “the enormous impact of racial differences in 
cancer incidence.” 


The New Era of Human Genomics 


When the Human Genome Project (HGP) was first 
announced, its goals were articulated in terms of the 
anticipated medical benefits (see Chapter 6). This 
international public project was funded in the United 
States through the National Institutes of Health and the 
Department of Energy. Great Britain funded almost a 
third of the overall project through the Wellcome Trust, 
and other countries, such as Japan, also contributed 
funding and personnel. Several private corporations 
also worked on the human genome sequence includ- 
ing J. Craig Venter who founded Celera Genomics (see 
Chapter 6). 

The Human Genome Project (public and private 
groups together) not only gave us the complete DNA 
sequence of the human genome, but also provided 
a technical foundation for research on many other 
genomes in addition to the human genome. Research 
on the complexity of human genes revealed the ubiqui- 
tous presence of introns and exons that often code for 
overlapping genetic information. Scientists have refined 


the traditional definition of a gene to more appropri- 
ately reflect the new characteristics of human genes 
based on extensive research on the human genome 
(see Chapter 6). 

Among the benefits promised by the human genome 
research was the ability to develop effective treat- 
ments for devastating human genetic diseases such as 
Huntington disease (HD), muscular dystrophy and cystic 
fibrosis (see Chapter 10). In some cases scientists know 
the precise DNA mutation that causes a genetic disease. 
Using this precise information, scientists can locate the 
wild type (normal) alleles, which can be used in gene 
therapy methods to correct mutant genes by treating 
the patient with the wild type genes. This approach was 
used to successfully treat blindness in humans and dogs 
and to alleviate the symptoms of Parkinson’s disease 
and other genetically inherited human diseases (see 
Chapters 10 and 11). 


Personalized Medicine: Genetic Risk and 
the Environment 


People in the United States are afraid of getting can- 
cer, but in terms of actual risk, people should be more 
concerned about developing heart disease or diabetes. 
These diseases, and many more like depression and 
bipolar disorder, result from problems involving a com- 
plicated interplay of multiple mutant genes that are 
further complicated by the influence of environmental 
factors. 
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Medicines seldom work the same way in all patients 
because the specific genetic and biochemical makeup 
of each person is a little bit different when compared 
to another person. A person’s makeup depends on the 
influence of many factors in addition to genetics, includ- 
ing lifestyle and diet. Both can increase the risk of heart 
disease, but the impact of these factors is often small 
compared to the influence of genes that cause heart dis- 
ease to ‘run in a family’ (are inherited). 

The new genetic approach to future healthcare is 
based on patients who suffer side effects from medica- 
tions and might help to improve the efficiency, price, 
and effectiveness of specialized drugs and therapies. 
Drug dosages based on genetic information, in addition 
to other patient characteristics, would allow doctors to 
avoid testing a large number of potential medicines by 
trial-and-error to decide how to treat each individual 
patient. In the future, advocates claim that the doctor 
will compare each patient’s genome with sequences in 
a large DNA database to select an appropriate medi- 
cine judged by the patient’s genome to be most effec- 
tive drug available, with the least risk of side effects. In 
recent years important progress has been made in areas 
of medicine where human genetics has overlapped 
with behavior, depression, and personality traits. For 
example, scientists found out that the rate that drugs 
are metabolized in the human body is strongly influ- 
enced by the genetics of the liver, especially for drugs 
used to treat social anxiety, depression, and similar 
disorders. Some DNA mutations slow the body’s drug 
metabolism, possibly causing serious side effects as a 
result of buildup of drugs and their breakdown prod- 
ucts (metabolites). The widespread use of pharmacog- 
enomics and personalized medicine could have a large 
impact on how physicians routinely prescribe medica- 
tions if it results in a more accurate way to administer 
doses of medicine for individuals. 

There is much research to do in order to better 
understand the details of the interactions between 
human genes and the environment. Sometimes 
researchers use “surrogates” or arbitrary characteristics 
to define specific groups of people who react better 
to one medicine or another medicine in clinical trials. 
Increasingly, researchers have used “race” as a sur- 
rogate characteristic, which raised ethical and moral 
dilemmas for many people. Currently there are some 
drugs available that are marketed solely to members of 
specific racial groups, which claim to be more effective 
for use in one “race” just as some medications are mar- 
keted primarily to members of one gender. To provide 
the correct medical information when advising patients, 
doctors and other healthcare providers must keep up 
to date with new developments in science and medi- 
cine in their specialty and they must be aware of the 
economic, social, cultural, and ethical issues associated 
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Box 16.1 The BiDil Story 


On June 23, 2005, the first so-called ethnic or racial drug, 
BiDil (pronounced bye-dill), was approved for use by one 
racial-ethnic group, African Americans. Hailed at the time 
as a preview of the many future drugs that would also tar- 
get specific ethnic and racial groups, it was applauded as an 
example of the medical establishment finally paying atten- 
tion to African Americans, a group very long neglected by the 
healthcare industry. Critics claim that the difficulties encoun- 
tered by BiDil in the FDA approval process had mostly to do 
with commercialization and patent expiration dates. 

Sold as “for blacks only”, BiDil is a fixed-dose com- 
bination of isosorbide and hydralazine, designed to treat 
congestive heart failure, which is common in blacks. BiDil 
became controversial as the first drug approved by the FDA 
for use in only one racial-ethnic group. 

Critics note that the research supporting the attributes 
of BiDil involved testing BiDil in African American people 
only. The efficacy of BiDil to treat congestive heart failure 
in other human populations remains unknown. 

After the original patent covering the production of BiDil 
was not renewed, the issue of marketing a drug to one race 
became more controversial. The research to test the efficacy 
of BiDil involved small numbers, only 49 African Americans 
in one group and just 1050 self-identified African Americans 
in another, both small sample sizes even by government 
testing standards. On the basis of this questionable research 
a new patent was filed, andthe FDA approved BiDil with a 
race-specific label, for use in treating black patients only. 

Many questions concerning this race-targeted drug 
remain unanswered, including the efficacy of treating patients 
of one race compared to patients belonging to another race. 
Even if, as advertised, the BiDil drug works better in African 
Americans than in other races, it is still questionable policy 
to restrict its use to African Americans. This limitation poten- 
tially poses a real risk that some people who might benefit 
from taking the drug, will not have access to the medica- 
tion because they cannot prove they belong to the correct 
race. How will doctors define race with regard to access to 
the drug? Must a patient be 100% black or Asian American? 
What about people who are one half or one quarter African 
American, are they black enough to have access to this drug? 


with each new application. One good example is a 
drug marketed “for blacks only”, which is very popular 
even without appropriate drug safety testing. 


GENES AND THE IMPACT OF 
ENVIRONMENT 


Some of the differences in health status observed among 
people of different ‘races’ might be due to the influence 
of the environment. In the past it was common to try to 
quantify the contributions from nature (i.e., genes) and 
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nurture (i.e., environment) to an individual, but now we 
know that it is not possible to accurately quantify the 
contributions of genetics and environment to a particu- 
lar trait and phenotype of an individual. 

All the cells in an individual’s body have identi- 
cal genome DNA sequences and carry the exact same 
genes. Even though all the cells carry identical DNA 
genomes, the genes are expressed differently in dif- 
ferent cell types. For example, muscle cells express 
different genes than nerve cells, which is why mus- 
cle cells contract and extend and nerve cells transmit 
electrical nerve signals (see Chapter 6). It is the expres- 
sion of different genes in different cell types that allow 
cells to carry out specialized biochemical functions in 
the body. This is a very important concept because it 
helps people to grasp the idea that all of our cells have 
the same genes, it is the differential gene expression 
in different cells that generates different types of cells 
and tissues. The cells in an individual human body all 
contain identical copies of the human genome, but 
humans often inherit different versions of the genes, 
the alleles. Humans inherit two copies of each gene 
one from the mother and one from the father (see 
Chapters 9 and 10). The two copies of the gene can 
be identical. Either both alleles are normal (wildtype), 
both alleles are mutant, or the inherited alleles are dif- 
ferent from each other—one allele is wildtype and the 
other allele contains the gene mutation. 

The environment has a large impact on gene 
expression and on the functions of the gene product. 
However, in the case of a genetic disease such as sickle 
cell anemia, inheriting two copies of the mutant globin 
gene has an overwhelming impact on the individual, 
and causes a serious disease. Living in a healthy envi- 
ronment and following a healthy lifestyle is good but 
will not prevent or reverse sickle cell disease in people 
who inherit two mutant globin alleles. The impact of 
the environment tends to be larger when the disease 
is caused by multiple genes, such as heart disease and 
diabetes (type II) (see Chapter 10). Like most cancers, 
the development of prostate cancer involves contribu- 
tions from both genes and environment. The prostate 
glands of Japanese men develop different forms of 
prostate cancer depending on whether they are living 
in the United States or Japan. In this case it is a com- 
bination of environment and genetics that causes can- 
cers to develop (see Chapter 9). 


Heart Disease is Caused by Many Genes 
and the Environment 
Cardiac (heart) disease is a good example of a human 


disease that is impacted by environmental factors such 
as lifestyle and diet as well as by inherited genes. 
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FIGURE 16.8 Cholesterol functions in the cell membrane. 
Cholesterol has important functions in the cell plasma membrane. The 
membrane is a double layer of phospholipids (blue) with their charged 
‘head units’ exposed on the outside of the membrane and the fatty 
acid tails located inside the membrane. Cholesterol is transported by 
the lipoprotein carriers, LDL and HDL, to different cells in the body. 


Cardiac disease can be caused by defects in cardiac 
genes and proteins, but heart disease is often caused 
by processes that are not directly involved with the 
structure of the heart. Scientists discovered a clear cor- 
relation between the incidence of heart disease and 
high levels of certain forms of cholesterol in the blood. 
Cholesterol is important because it is one risk factor 
that people can control in order to reduce the risk of 
heart disease. 

Cholesterol is a necessary chemical needed to build 
new cell membranes and make non-protein hormones. 
However, because of the chemical structure of choles- 
terol the molecule cannot dissolve in water or in the 
blood (Figure 16.8). The body solves this problem by 
providing two types of lipoprotein carriers (HDL, LDL) 
to transport the cholesterol in the blood to and from 
the cells. 


e Low-density lipoprotein (LDL), known as “bad” 
cholesterol 

e High-density lipoprotein (HDL), known as “good” 
cholesterol 

e Triglycerides 

e Lp(a) cholesterol 


Total Cholesterol Count 


The total cholesterol count (LDL, HDL, triglycerides and 
Lp(a) cholesterol) is determined by a blood test. The LDL 
(bad) cholesterol carries 60% to 75% of the cholesterol 
in the blood. If the LDL levels are too high, the LDL 
slowly builds up on the inner walls of the vessels, clog- 
ging the arteries to the heart and brain. LDL promotes 
the formation of plaque, a thick, hard deposit that nar- 
rows the arteries and causes atherosclerosis. The plaque 
can form a clot and cause a heart attack or stroke. 
About 25% to 40% of the blood cholesterol is 
carried by high-density lipoprotein (HDL), which is 
healthy because high HDL levels protect against having 
a heart attack. HDL helps to remove excess cholesterol 
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from arterial plaques, which slows the accumulation of 
plaque. HDL also carries cholesterol back to the liver, 
where it is removed from the body. Triglycerides are a 
type of fat that is also measured by a blood test. High 
triglyceride levels can be caused by obesity, physi- 
cal inactivity, cigarette smoking, excess alcohol and a 
carbohydrate rich diet, and are common in people 
with heart disease and/or diabetes. Lp(a) is a genetic 
variant of the bad LDL. High levels of Lp(a) indicate a 
significant risk factor for premature fatty deposits in the 
arteries. 

The blood levels of the LDL and HDL forms of cho- 
lesterol reflect contributions from diet, lifestyle (amount 
of exercise), and genetics. Some people inherit a com- 
bination of genetic alleles that permit them to consume 
high levels of fatty cholesterol-rich foods without dis- 
rupting the relatively low levels of the bad LDL choles- 
terol in the blood. Other people inherit gene alleles that 
act to maintain high levels of cholesterol despite eating 
a very low-fat diet and getting plenty of exercise. Recent 
medications called statins have been very successful at 
reducing the level of LDL cholesterol and increasing 
the HDL levels in the blood, reducing the impact from 
environment, diet, and lifestyle. 


Hypertension: A Silent Killer 


High blood pressure (hypertension) is also influenced 
by genetic and environmental factors. High blood pres- 
sure is often called the “silent killer,” because there are 
no warning symptoms and untreated hypertension can 
lead to stroke and heart disease at a young age. Health 
studies show that compared to other Americans, African 
Americans typically have a much higher incidence of 
uncontrolled high blood pressure. However, it is not yet 
clear the extent to which this is caused by genetic or 
environmental factors. It is interesting to note a recent 
study showing that native Africans living in Africa, are 
genetically very similar to African Americans, but do 
not have the uncontrolled high blood pressure found in 
black Americans. This study emphasizes that there is a 
great deal of research yet to be done to understand the 
genetic relationship between genes, hypertension, and 
the distribution of this silent killer in various population 
groups. 


WHAT CAUSES GENETIC DISEASES TO 
PREDOMINATE IN CERTAIN HUMAN 
POPULATIONS? 


If we accept the idea that race is not a genetic concept, 
then why do certain genetic diseases predominate in 
certain racial groups (Table 16.4)? Here we will examine 
sickle cell anemia, a genetic disease that affects many 
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more African Americans than Caucasian Americans (see 
Chapter 10). 

The obvious risk in using race as a surrogate concept 
is that this could hide a genetic reason that one drug 
might work better for certain people than other drugs 
and might interfere with the necessary genetic analysis. 
Race might be a surrogate for some genetic differences 
between people, but even if a drug works better for 
African Americans than Caucasians, there is no sharp 
distinction (and no genetic difference) between the two 
populations. All humans have the same genes regard- 
less of race; in other words there is no gene, no allele, 
and no DNA sequence that is present only in the 
genomes of people in one race but not in another race. 
Furthermore, an increasing number of Americans are of 
mixed race and do not fit into the existing categories of 
race based on physical appearance. 

Scientist studied the comparative death rates for peo- 
ple of different races who have succumbed to cancer 
or other chronic diseases (Table 16.4 and Table 16.5). 
To overcome the age bias inherent in different popula- 
tions, most comparative death studies use age-adjusted 
rates. This is because a population with a higher aver- 
age age is going to have a higher overall death rate due 
to chronic diseases such as heart disease and cancer, 
which predominantly affect an older population. On 
the other hand, a younger population will have a larger 
number of deaths from accidental causes, such as auto- 
mobile accidents. To make sure that the comparison 
actually addresses differences among races, it is impor- 
tant to adjust the death rates for the ages of the popula- 
tion under study. 


Sickle Cell Anemia is a Genetic Disease, 
Not a Racial Disease 


Sickle cell anemia, which is caused when a person 
inherits two copies of a mutant beta-globin gene, is 
much more common in African Americans in the U.S. 
than in other groups. The normal (wildtype) beta-globin 
gene encodes beta-globin protein, one of the two pro- 
teins that make up the hemoglobin in red blood cells. 
Hemoglobin normally contains four proteins, two beta- 
globin proteins and two alpha-hemoglobin proteins, 
which assemble together with heme and iron (Fe) to 
produce the hemoglobin molecule that carries oxygen 
(see Chapter 10). The severe symptoms of sickle cell 
disease occur in people who inherit two mutant beta- 
globin genes. Their blood cells cannot make normal 
beta-globin proteins so the body must survive on hemo- 
globin containing mutant beta-globin proteins. 

In diseased red blood cells, the mutant beta-globin 
proteins (HbS) assemble with the alpha-globin pro- 
teins into mutant hemoglobin molecules. Because 
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TABLE 16.5 Death rates from malignant neoplasms (cancers) in racial/ethnic 


groups in the United States, 1998 


Total (per 100,000) Lung 

Breast Men Women Men Women Women 
Racial/ethnic group 

White 146 106 49.4 27.4 19.0 
Black 208 129 70.8 27.2 26.2 
American Indian/ 

Alaskan Native 96 74 33.9 16.5 10.8 
Asian/Pacific Islander 91 63 24.6 11.2 9.3 
Hispanic 93 64 21.4 8.3 12.5 


Data source: National Center for Health Statistics. Health, United States, 2000 with Adolescent Health 


Chartbook Hyattsville, Maryland: 2000. 


of the amino acid change in the beta-globin protein, 
inside the cells the abnormal hemoglobin molecules 
aggregate together into stiff protein rods that grow long 
enough to distort the physical shapes of the red blood 
cells from normal Frisbee-like disks to banana-shaped 
sickle cells (see Chapter 10). Unfortunately the sickle 
red blood cells cannot pass easily through the thin 
capillaries in the body they block the blood vessels and 
cause sudden attacks of severe pain, fever, swelling and 
possibly organ damage. Attacks can be triggered by any 
physical activity that increases the body’s requirement 
for oxygen, which creates a low oxygen environment 
that causes the mutant red blood cells to sickle. 


A Crucial Link Between Sickle Cell Trait 
and Malaria Resistance 


HbS has been at the center of a medical and scientific 
puzzle since the mid-1940s when doctors in Africa 
first noticed that patients with sickle cell anemia were 
much more likely to survive malaria than the European 
patients who do not carry sickle cell trait (Figure 16.9). 
The Plasmodium falciparum parasite that causes malaria 
begins the human part of its life cycle when a mosquito 
carrying the sporozoite form of the parasite bites a per- 
son (Figure 16.10). The sporozoites enter the blood- 
stream and migrate to the liver where they infect the 
liver cells and continue to develop and multiply into 
the merozoite form of the parasite, which rupture the 
liver cells and enter the bloodstream (Figure 16.11). The 
merozoites infect the red blood cells where they con- 
tinue to develop and eventually produce parasite game- 
tocytes, which infect a biting mosquito during a blood 
meal and continue the next parasite life cycle. 

In many parts of the world, malaria is a serious dis- 
ease that kills over 1 million people every year. Scientists 


FIGURE 16.9 Malaria is caused by a parasite. The Plasmodium fal- 
ciparum parasite that causes malaria has a complicated life cycle. 
The sporozoite form of the parasite is shown here in the cytoplasm of 
a mosquito midgut epithelial cell (false-color electron micrograph). 


found that the sickle cell gene (the mutant beta-globin 
gene) is especially prevalent in the genomes of people 
from the parts of Africa that are typically hardest hit 
by malaria. Evolutionary biologists have proposed that 
the sickle cell mutation in the beta-globin (HbS) gene 
became permanently established in the human popula- 
tion after especially serious outbreaks of malaria took 
place in Asia, the Middle East, and Africa. 

People (carriers) who inherited one mutant beta- 
globin allele and one normal (wildtype) beta-globin 
allele do not typically suffer sickle cell symptoms unless 
they over-exercise in an environment of low oxygen. 
These people have red blood cells that make both the 
wildtype and mutant beta-globin proteins in the same 
cell and as a result the cells retain their disk-like shape 
under normal oxygen conditions. Even though the red 
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blood cells in sickle cell carriers make only about half 
the normal number of wildtype beta-globin proteins, 
apparently there is sufficient beta-globin available to 
assemble enough functional hemoglobin molecules to 
significantly decrease the formation of sickle-shaped 
cells and prevent disease symptoms. 


Selective Advantage for Sickle Cell Trait and 
Malaria Resistance 


People who are carriers of sickle cell disease are the 
result of natural selection, not because the mutant HbS 


FIGURE 16.10 The mosquito is a vector for malaria. This mosquito 
(Anopheles albimanus) is feeding on a human arm. The mosquito 
carries the parasite that causes malaria, making mosquito control an 
effective way of reducing the risk of the malaria disease. 
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beta-globin gene causes sickle cell disease but because 
the mutant beta-globin gene is closely linked genetically 
to a nearby gene that confers resistance to malaria. The 
genes encoding sickle cell and the resistance to malaria 
are almost always inherited together by the same indi- 
vidual, indicating that the mutant beta-globin gene is 
located adjacent to the malaria resistance gene on the 
human chromosome DNA. This means that if a person 
inherits a mutant beta-globin gene (sickle cell trait), 
there is a very high probability that the person will also 
inherit a gene that confers resistance to malaria. How 
did this tight genetic connection affect the distribution 
of the sickle cell mutation in the human population? 

Scientists were surprised to learn that in some parts 
of Africa as much as 40% of the population carries one 
copy of the mutant beta-globin gene (sickle cell trait) 
and suffer little or no serious symptoms from malaria. 
When the mutant sickle cell allele is frequent in the pop- 
ulation, the resistance to malaria conferred by the sickle 
cell trait gives carriers a significant selective advantage 
during frequent outbreaks of malaria. But people who 
are sickle cell carriers also face the disadvantage of 
passing a potentially lethal disease gene to a biological 
child. A Punnett square can be used to predict the inheri- 
tance of the sickle cell genes (Figure 16.12). 

Two parents who are carriers of sickle cell each have 
one copy of the mutant beta-globin allele (heterozygous) 
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FIGURE 16.11 The life cycle of the malaria parasite in the human body. The Plasmodium falciparum parasite enters a human when an infected 
mosquito bites a person and draws blood. A form of the parasite called the sporozoite enters the bloodstream and travels to the liver. The para- 
site infects the liver cells, multiplies into merozoites and causes the liver cells to rupture. Once in the bloodstream, the merozoites infect red 
blood cells, where they continue to develop into ring forms, then trophozoites (a feeding stage), then schizonts (a reproduction stage), then 
back into merozoites. When the parasite gametocytes are produced in the bloodstream they are taken up by a feeding mosquito, where they 


are ready to continue the life cycle in the next bitten human. 


Chapter | 16 Genes and Race 


(and one wildtype copy), which gives them a 25% 
chance that their biological child will inherit two mutant 
beta-globin genes and have sickle cell disease and resist- 
ance to malaria (see Figure 16.12). Carriers also have a 
25% chance that their child will inherit two wildtype 
beta-globin genes, which means normal hemoglobin, 
but no protection against malaria. The carrier parents 
have a 50% chance of having a heterozygous child who 
will inherit one mutant and one wildtype beta-globin 
allele, be a genetic carrier of the sickle cell trait, and 
exhibit some protection from malaria. 


A Molecular Mechanism to Explain 
Malaria Resistance 


The intriguing connection between sickle cell trait and 
malaria stems from observations that individuals who 
inherit the sickle cell trait (allele) are also protected 
from malaria. Exploring the genetics and biology of 
sickle cell disease has helped scientists to interpret the 
complex influence of evolutionary pressures on the 
inheritance of these disease genes. Researchers inves- 
tigated possible mechanisms to explain how resistance 
to malaria might be connected to the sickle cell trait. 
Scientists finally connected the dots leading from sickle 
cell, to African-American blacks, to malaria resistance. 
Scientists have now described and characterized the 
malaria resistance gene that is often inherited along 
with sickle cell trait, and answered some long standing 
questions. 

Scientists discovered a strong genetic link between 
a null (knockout) mutation in the Duffy gene and the 
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mutant beta-globin gene (sickle cell trait) located 
nearby on the same chromosome. The co-inheritance 
of the Duffy knockout allele and the mutant beta-globin 
gene (sickle cell trait) suggests that the Duffy protein is 
part of the biological mechanism that protects carriers 
against malaria. In addition, the Duffy null (knockout) 
phenotype is most common in people whose ances- 
tors came from populations in regions of Africa where 
malaria is endemic (native to the region). 

How might the Duffy protein be involved in 
malaria? Interestingly, the Duffy protein is normally 
located on the outside surface of red blood cells, where 
it is used by the malaria parasite to attach to and enter 
the red blood cells. The mutant Duffy alleles cannot 
make Duffy proteins so the sickle cell carriers have far 
fewer Duffy receptor proteins on the red blood cell sur- 
faces, dramatically reducing the ability of the parasite 
to attach to the red blood cells, and greatly reducing 
the chances of a malaria infection in sickle cell carriers 
(see Figure 16.9). 

African Americans in the United States have a higher 
frequency of the sickle cell allele, and a higher incidence 
of sickle cell disease, compared to European Americans. 
But there are many people who are not black but carry 
the sickle cell allele or have sickle cell disease and do 
not live in the United States. These people often origi- 
nate from countries such as Greece or Saudi Arabia, 
where malaria is also common. Of course, most African- 
Americans now in the United States brought their sickle 
cell and malaria genetic heritage with them when they 
came to the New World as slaves, and most came from 
West Africa where malaria is endemic. Genetics and 
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FIGURE 16.12 What are the chances that a child will inherit sickle cell disease? (A) The Punnett square is a simple grid diagram that geneti- 
cists use to organize the parental alleles (egg and sperm) so that the alleles can be used to predict the genotypes of the offspring. The Punnett 
square provides a summary of the possible combinations of one maternal allele with one paternal allele for each gene. In this case both 
parents are heterozygous for the beta-globin sickle cell allele: (H, S) and (S, H). The H and S alleles are placed across the top row and down 
the left column of the grid. To predict the genotypes of the offspring, recall that each fertilization event involves a sperm and an egg, carrying 
one copy of each chromosome, gene and allele. The colored arrows indicate how each allele forms the genotypes of the fertilized zygotes. 
(B) The genotypes of the possible offspring from two carriers of sickle cell disease are shown in the center of the Punnett square: Homozygous 
(HH) (normal or wildtype) inherited two normal beta globin alleles (25%). Homozygous (SS) (sickle cell disease) inherited two mutant 
beta globin alleles (25%). Heterozygous (HS) (sickle cell carriers) inherited both a wildtype beta globin allele (H) and a mutant sickle cell 


beta globin allele (S) (50%). 
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environment provide an ideal setting for evolutionary 
pressure to select for carriers of sickle cell who are pro- 
tected from malaria. 


It is clear why scientists think it is misleading to label sickle 
cell anemia (or any disease) a racial disease (or any drug as 
a racial drug). African Americans who are sickle cell carriers 
are also protected from malaria because they carry the Duffy 
knockout allele located near the mutant globin gene in the 
genome. Over time, sickle cell carriers resistant to malaria 
have a selective survival advantage because they do not 
succumb to malaria. 


THE CONCEPTS OF RACE AND 
INTELLIGENCE 


Among all the possible issues of racial disparity, those 
involving race, intelligence (IQ) and genes are perhaps 
the most difficult to interpret without bias. From an 
evolutionary perspective it is unlikely that any modern 
human populations have been in isolation for a suf- 
ficient period of time to detect any differences. As 
humans migrated out of Africa and around the world, 
there must have been sufficient contact between the 
villagers moving on and those staying behind to make 
sufficient isolation impossible (see Chapter 8). Earlier 
studies traced the migration of humans out of Africa 
and onto the other continents, and now the science of 
molecular genetics has confirmed these conclusions. 

Scientists analyzed the lineage of DNA sequences 
from the 22 autosomal (non-sex) human chromo- 
somes, and the X and Y chromosomes to study human 
evolution. The Y chromosome is used to trace the male 
lineage, and the mitochondrial DNA genome is used 
to trace the maternal female lineage from mothers to 
sons and from mothers to daughters. The results from 
studies using different methods have been remarkably 
consistent and support the idea that the human popu- 
lation has not been subjected to sufficient periods of 
geographical isolation to allow different human races 
to evolve naturally. For this study the scientists defined 
the different races in the context of a subspecies, but 
others point to the undeniable differences in average 
IQ scores, about one standard deviation or 15 points, 
between whites and blacks in America. 


Limitations of the Standard IQ Test 


The IQ test was designed to measure intelligence but 
it is difficult to believe that a human characteristic 
as complicated and subjective as intelligence can be 
represented by a simple test. Nonetheless, IQ testing 
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is commonly used in the United States and is ranked 
highly as an indicator of intelligence. 

The history of IQ testing gives interesting insights 
into the usefulness of this test to assess this valued 
human quality. When Francis Galton invented the 
pseudo-science of eugenics, based on the erroneous 
belief that every human trait is inborn, he legitimized 
a concept that has often been misused for the benefit 
of prejudiced people. In 1892, Galton published his 
book, Hereditary Genius, which was widely viewed as 
the first scientific investigation of intelligence. He orig- 
inated the idea that intelligence can be measured by a 
test. One of Galton’s students, James McKeen Cattell 
brought intelligence testing to America for the first time 
in the 1890s. Ironically, particularly given the exten- 
sive use of testing today in U.S. schools, Cattell’s work 
soon fell out of favor because the results did not corre- 
late well with the success of the students in school. 

At about the same time in France, Francis Binet was 
asked by school authorities to devise a test that could 
accurately predict the good and poor French students. 
Binet worked with average and disabled students, and 
decided which activities a “normal” child should be 
doing at specific ages, leading to the notion popular 
at the time that it was possible to determine a child’s 
“mental” age. German psychologist Wilhelm Stern 
proposed that the ratio between a child’s mental and 
chronological ages was an indication of the child's 
intelligence and American Lewis Terman coined the 
term intelligence quotient (IQ). Despite extreme reser- 
vations on the part of the inventors of the IQ test, Stern 
and Binet, who always doubted that intelligence could 
be measured by a simple test, the IQ test soon gained 
wide acceptance across America. The Army adopted 
the IQ test in 1917 as a way to rapidly determine which 
draftees were suited for which jobs in the military. 
Despite the fact that the original IQ tests were designed 
to be given individually in an interview setting, the 
Army developed rapid IQ tests for potential soldiers. 
Soon after, the IQ test found its way into almost every 
school system, public and private in the country. 

The controversy over IQ testing continued to grow 
and in the 1960s and 1970s, the IQ test fell out of favor, 
in part because the test contained a number of cultur- 
ally biased questions easily understood by white subur- 
ban middle-class children but unfamiliar to black kids 
from the city. In time the IQ tests were revised to be 
more culturally neutral and are once again used widely 
in public schools. Critics include Howard Gardner, a 
Harvard professor who proposed the concept of “multi- 
ple intelligences” in 1983. In a more modern context it 
is correct to say that intelligence is very complex with 
many components and interacting variables, rather than 
something tangible that can be measured by a single 
number or even by a battery of tests. 
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Stereotype Threat could Explain Racial Bias 


Claude Steele (Stanford University) statistically ana- 
lyzed the Standard Achievement Test (SAT) verbal scores 
of black and white students and found no difference 
between these two groups even on the difficult ver- 
bal test from the Graduate Record Exams. Surprisingly, 
there was a difference between the test scores of the 
two groups when the task was presented as a way that 
measures cognitive ability, but the difference disap- 
peared when the test was presented as a problem solv- 
ing task. Similar results were obtained when males 
were compared to females and when whites were 
compared to Asian math students. In situations where 
students expect to do poorly, because that is what they 
were told to expect, then the students follow through 
with the perceived expectation and tend to do poorly. 
On the other hand, in cases where the students feel 
confident about their expected abilities, they tend to 
do well. These test taking tendencies are known as the 
“stereotype threat.” 

Stereotype threat is an important factor used to help 
explain diverse abilities among many different groups. 
Before we ascribe these differences totally to innate 
abilities (i.e., genes), it is important to determine that 
the differences are not affected by the external factors 
in the environments in which the different groups live. 


It is fair to say that human intelligence is far too complex 
to be measured or quantified by a single number, like IQ; 
possibly the IQ discrepancies between African Americans 
and European Americans are more likely to be explained 
by stereotype threat, rather than by genetics. 


SUMMARY 


History tells us that since about 1684 the idea of race 
was created by people and promoted by people as a 
social and cultural concept. For whatever reason, most 
modern societies classify people according to the pre- 
vailing concept of race, although most societies do not 
adhere to the rigid apartheid rules used by the former 
South African government to control black people. 
It is common for people to categorize people by physi- 
cal characteristics such as short and tall, left-handed 
and right-handed, plump and skinny, and using talents 
such as athletic and nonathletic, and so forth. People 
are also categorized, often in a negative way, using 
skin color and other physical attributes such as slanted 
eyes or a wide nose that are commonly recognized as 
“racial traits”. 

The concept of race was a human invention of the 
seventeenth century, used as a mechanism to perpetrate 
racism, the idea that one group of people is inherently 
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(genetically), inferior to another group. There is no 
scientific evidence to support the idea that humans 
evolved into different races, so the current practice of 
categorizing people according to race using labels such 
as African Americans, Chinese American, etc., primarily 
for the sake of clarity must be accompanied by a state- 
ment about the terrible impact of discrimination due to 
racism and prejudice. 


REVIEW QUESTIONS 


1. What are the arguments for and against consider- 
ing race in humans to be a biological concept ver- 
sus a social construct? 

2. How has the term race viewed from the perspec- 
tive of history? 

3. Why is it possible to say that race is not well 
defined from a genetic point of view? 

4. Is there more variation within groups called races 
or between them? Explain the significance of the 
distribution. 

5. Explain how the geographical origins of humans 
are reflected in the differences between human 
genome DNA sequences. 

6. What is stereotype threat, and what does it have to 
do with the issue of the difference in intelligence 
among the races? 

7. Explain what cancer tells us about the relationship 
between genes and the environment. 

8. Define what is meant by the new field of 
pharmacogenomics. 

9. What factors can explain disparities in the health 
of different races? 

10. Explain some of the complexities in the relation- 
ships among race, intelligence, and IQ. 
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3' cohesive end Single-stranded complementary DNA end 
containing a 3’ OH (hydroxy! group) 


5’ cohesive end Single-stranded complementary DNA end 
containing a 5’ phosphate group 


5' to 3’ direction Starting at the 5’ end of a DNA strand and 
reading toward the 3’ end of the same DNA strand 


adenine (A) A purine base found in the DNA and RNA 
nucleic acids of cells in plant and animal cells 


adult stem cells 
ized tissue 


Undifferentiated cells growing in a special- 


affinity chromatography Method to separate molecules 
based on a highly specific chemical interaction between mol- 
ecules and the affinity matrix 


alkaline Having a pH greater than 7.0; also called basic 


alleles One particular version of a gene, or more broadly, 
a particular sequence of any location (locus) on a molecule 
of DNA 


allelic frequency The percentage of any allele in a popula- 
tion's gene pool 
allotropes Two or more different structures of a chemical 


element 


amino acids Any of a class of organic compounds contain- 
ing amino (NH9) and the carboxyl (COOH) groups; forming 
the main constituents of protein 


aminoacyl tRNA synthetase Enzyme that attaches an amino 
acid to a tRNA molecule 


aneuploidy A condition in which a cell inherits the wrong 
number of chromosomes (too many or too few) 


angiogenesis 
vessels 


The development and growth of new blood 


anthers Plant structures that contain pollen 


anthropologists Scientists who study the origin, behav- 
ior, and the physical, social, and cultural development of 
humans 


anticodon Group of three consecutive bases in tRNA that 
are complementary to a three base codon in the mRNA 


antigens Molecules that cause an immune response in the 
body and are recognized and bound by antibodies 


antiparallel DNA strands with polarity (5' to 3’) are arranged 
in opposite directions in the DNA helix molecule 


anti-toxin An antibody or other protein produced in the 
body in response to the presence of a toxin or poison 


apoptosis Programmed cell death; cell suicide 


archea One of three major branches of life (the other two 
being bacteria and eukaryota); it comprises halophiles and 
thermophiles 


asexually propagated The deliberate reproduction of whole 
plants using vegetative cells, tissues, or organs 


assimilation model One of three theories of the evolution 
of homo erectus into the anatomically modern human (homo 
sapiens) which proposes that the recent African migrants did 
interbreed with some other hominid groups, but that the 
degree of interbreeding varied greatly from one geographic 
region to another and from one time period to another 


attenuation The step-wise decrease in expression of a par- 
ticular gene 


autism A spectrum of disorders that cause significant 
impairment of mental function and social behavior 


autologous stem cell transplants Treatment in which the 
patient’s stem cells are used to replace damaged or diseased 
tissues in the patient’s body 


bacterial artificial chromosome One type of vector used to 
clone DNA fragments 


bacterial chromosome Structure within bacterial cells that 
contain the bacteria DNA 


bacterial restriction and modification system A system that 
protects bacteria against bacteriophage infection by destroy- 
ing the bacteriophage DNA but not the bacterial chromo- 
some 

bacterial strains Related but genetically distinct bacteria 
bacteriophages Viruses that infect bacterial cells 


baculovirus DNA virus that infects insects and related inver- 
tebrates and is widely used as a vector for animal genes 


bases Alkaline chemical substances; in molecular biology 
refers to the cyclic nitrogen compounds found in DNA and 
RNA 
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benign tumors Tumors that only grow in a single location in 
the body and do not metastasize to other tissues and organs 


biodiesel A fuel that is derived from soybeans, canola, sun- 
flower, soybean oil, palm oil, and other agricultural crops as 
well as vegetable oil recycled from restaurants 


biofuels Sustainable fuels that come from plants 


bioinformatics The computerized analysis and manipula- 
tion of large amounts of biological sequence data including 
DNA, RNA, and amino acid sequences 


biolistic gene gun Gene delivery system designed to shoot 
DNA projectiles directly into sub-cellular compartments in 
eukaryotic cells 


biological control (biocontrol) Approaches that use parasites, 
predators, or pathogens to control an unwanted organism 


biological database A collection of biological information 
including DNA, RNA and protein sequences organized for 
efficient access to data 


biomarkers Specific proteins that visibly change with the 
onset of a disease 


biopesticide A pesticide composed of a biological control 
agent 


blastocyst A very early stage in embryo development 


blunt ends Ends of a double-stranded DNA molecule that 
are fully base paired without unpaired single-stranded over- 
hang 


bottleneck Evolutionary event in which most of the indi- 
viduals in a population are killed or are otherwise unable to 
reproduce 


cancer genome A set of DNA mutations that act together to 
convert a normal cell into a cancer cell 


capsid Protective protein layer that surrounds the DNA or 
RNA genome in a virus particle 


carcinogen An agent that causes a normal human cell to 
become a cancer cell 


cell division cycle Series of stages that a cell goes through 
between cell division events 


cell-free extract Subcellular fraction created in vitro that 
retains biological activity 


cellulose Structural polymer of 8-1,4-linked glucose that is a 
major component of plant cell walls 


cell wall 
membrane 


The layer covering a plant cell outside the plasma 


central dogma Basic flow of genetic information in living 
cells starting with genes (DNA), then messenger (mRNA), and 
finally, proteins 


centromere Region on each eukaryotic chromosome where 
microtubules attach to move chromosomes during mitosis 
and meiosis 
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cetane number A measurement of the combustion quality 
of diesel fuel during compression ignition 


chaperones Proteins that help newly synthesized proteins 
fold into proper three-dimensional structures 


chimera A hybrid created by fusing together proteins (or 
genes) from two species 


chloroplasts Specialized membrane-bound organelles that 
are the sites of photosynthesis in green plants 


chromatid One of the paired daughter DNA strands pro- 
duced by DNA replication 


chromatin DNA-protein complexes that are major compo- 
nents of eukaryotic chromosomes 


chromosomes Structures in the cell nucleus that each con- 
tain one linear double-stranded DNA molecule 


clone A genetically identical population of cells derived 
from a single parent cell 


cloned gene A recombinant DNA gene carried in a DNA 
vector that is propagated in pure populations of cells 


cloning The production of identical copies of a specific 
DNA fragment or grow genetically identical cells from a sin- 
gle parent cell 


codominance A situation in which gene products from both 
alleles of a gene are made in the cell and both proteins influ- 
ence phenotype 


codons A group of three consecutive RNA or DNA bases 
that encode a single amino acid 


cohesive end sites (cos) Complementary single-stranded 
cohesive ends of the lambda (\) viral genome DNA 


coincidental match A questionable DNA match scored 
between a suspect and the evidence 


column Vertical tube used in chromatography methods 


combinatorial control Control of gene expression involving 
the presence or absence of a particular combination of regu- 
latory proteins in different cells 


Combined DNA Index System (CODIS) The Federal Bureau 
of Investigation computer system that enables local, state, 
and federal law enforcement officers to search the forensic 
DNA databanks of law enforcement agencies throughout the 
United States 


commercial biocrude Biofuels that are used commercially 


conjugation Process in which genes are transferred between 
bacteria by cell-to-cell contact 


complement activation A process in which a series of com- 
plement proteins launch a stepwise cascade that leads to the 
assembly of protein complexes that make holes in the mem- 
brane of the target cells 


complementary DNA (cDNA) library A DNA collection 
containing the DNA copies of RNA transcripts produced in 
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the particular cell type used to generate the DNA clones in 
the library 


consensus sequence A sequence created by compar- 
ing many promoter sequences and choosing the bases that 
appear most frequently at each position 


conserved Refers to an amino acid sequence in a protein 
that has remained essentially unchanged throughout evo- 
lution, or a DNA or RNA sequence that is common in the 
genomes of different organisms 


constitutive A type of gene promoter that almost always 
expresses high levels of RNA and the encoded protein 


continuous replication Refers to the mechanism of DNA 
replication used to synthesize the leading strand of DNA at 
each replication fork 


control region The part of a gene containing the promoter 
and regulatory DNA sequences that control transcription or 
replication 


cosmid vectors Vectors that can carry very long DNA frag- 
ments, enter the cells with high efficiency, and replicate as 
independent plasmids in the cells 


covalent modification Altering the structure of a DNA, 
RNA or protein molecule by enzymatic means, changing 
the chemical properties of that macromolecule; for example: 
phosphorylatyion and methylation 


cyclin-dependent kinases (Cdk) Special protein kinase 
enzyme that is activated by a cyclin protein and controls 
eukaryotic cell division 


cyclins Family of proteins that fluctuate with the cell cycle; 
at high levels cyclins bind to CdK to make MPF (mitosis pro- 
moting factor) and control the cell cycle 


cytokines Short peptides that stimulate the growth of 
immune cells 


cytokinesis The final stage of cell division that physically 
separates the two daughter cells 


cytological staining Method of staining cells so that specific 
subcellular structures are visible when the stained cells are 
magnified in a light microscope 


cytoplasm The protoplasm surrounding the nucleus of a 
eukaryotic cell 


database mining Searching the vast amount of information 
in a database to find the few pieces of data of interest to the 
researcher 


diploid The genome complement of cells that inherit two 
copies of each chromosome and two copies of each gene 


discontinuous replication Refers to the mechanism of DNA 
replication used to synthesize the lagging strand of DNA at 
each replication fork 


disease-specific Pertaining to a particular disease 


divergence A process in which differences in nucleic acid 
and protein sequences accumulate over time 
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DNA (deoxyribonucleic acid) Nucleic acid polymer of bas- 


es which make up the genes 


DNA hybridization probes Tools used to identify specific 
target DNA sequences using a library or genome screen 


DNA ligase Enzyme that catalyzes the formation of a cova- 
lent chemical bond between the 3’ and 5’ ends of two DNA 
strands, sealing the DNA backbone and joining DNA frag- 
ments end to end 


DNA polymerase An enzyme that copies DNA templates 
into new DNA strands when chromosomes are being repli- 
cated 


DNA profile The genotypes and other chromosome markers 
that identify the unique genome sequence of each individual 


DNA replication origin Site on the DNA helix where DNA 
replication begins 


dominant A gene that directs a phenotype even when 
present at only one copy per cell 


dominant allele A gene allele that determines a phenotype 
even when present at only one copy per cell 


E. coli DNA polymerase I One of three DNA polymerase 
enzymes that replicate DNA in E. coli cells 


electroporation Technique that uses an electric field to 
make cells take up DNA 


elongation factor Proteins that are required for the elonga- 
tion of a growing polypeptide during translation (protein syn- 
thesis) 


embryonic stem cell lines Cell lines generated from embry- 


onic stem cells (ESCs) 


embryonic stem cells (ESCs) Stem cells derived from the 
inner cell mass within the blastocyst embryo 


endonuclease An enzyme that cleaves internal covalent 
bonds and degrades DNA molecules 


enhancers Regulatory sequences in the genome DNA that 
are often located at long distance from the promoter region of 
eukaryotic genes 


enol 
base 


A specific chemical structural conformation of a DNA 


enucleated egg cell An egg cell (oocyte) from which the 
nucleus has been removed 


enzyme A protein that acts as a catalyst, increasing the rate 
at which a chemical reaction occurs, without itself changing 
in molecular structure 


epigenetic Refers to inherited phenotypes that are not due 
to changes or mutations in DNA sequence 


epigenetic memory A process of protein modification that 
allows cells to maintain an undifferentiated state 


epigenetic modification Changes that impact the entire 
genome, altering gene expression and changing the state of 
the cell 
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epigenome The newly reprogrammed genome in the devel- 
oping embryo 


epitope Localized structure in an antigen to which an anti- 
body binds 


Escherichia coli (E. coli) A species of bacterium that nor- 
mally lives in the human gut and is commonly used in 
research in genetics and molecular biology 


eukaryotic The type of biological cell containing a nucleus 
and chromosomes 


exonucleases Enzymes that cleave single mononucleotides 
from the 5’ or 3’ end of the DNA strand 


Expect value (E) For a BLAST search, this value measures 
the number of hits (matching sequences) that can be expect- 
ed by random chance using a particular query (the lower the 
E-value, the more likely that the sequences are related to each 
other) 


expressed The process of copying a DNA region (gene) into 
RNA (transcription); the process of protein synthesis (transla- 
tion) used by living cells 


expressed sequence tags (ESTs) A special type of cloned 
DNA derived from DNA sequences that were transcribed into 
RNA strands in the cell 


ex vivo When diseased cells are removed from the body, 
treated and returned to the patient's body 


FASTA Online program that compares the sequences of pro- 
teins or nucleotide sequences 


forensic DNA databanks Databases that contain the DNA 
profiles (DNA fingerprints) of people who have been convict- 
ed of crimes as well as DNA profiles from evidence samples 
collected at unsolved crimes 


free radicals Very reactive ions that carry an unpaired elec- 
tron and can damage and mutate DNA genes and proteins 


Galapagos Islands An archipelago of volcanic islands dis- 
tributed around the equator in the Pacific Ocean, where Dar- 
win studied evolution 


gametes Haploid egg or sperm cells produced for sexual 
reproduction 


gel electrophoresis An electric field that moves charged 
biomolecules through a gel matrix in order to sort DNA, RNA 
and protein molecules by size 


gene copy number The number of copies of each gene 
present in one genome 


gene expression The process by which genes send instruc- 
tions to cells through transcription and translation 


gene guns Gene delivery system that shoots DNA projec- 
tiles directly into the subcellular organelles in the target cells 
(nucleus, mitochondrion, chloroplast) 


gene knockout A DNA mutation made when a specific gene 
is removed from a genome or disabled by an insertion muta- 
tion that disrupts the gene 
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genes Biological genetic units of heredity 


gene-specific DNA probes Short single-stranded DNAs used 
to test for a specific mutant or wildtype gene in an individual's 
genome or library 


genetically linked When a gene or other chromosome 
marker and a particular gene are inherited together through 
many generations 


genetically modified organisms Plants and animals that 
have had their genetic makeup altered to exhibit traits or pro- 
duce proteins that are not typical 


genetic code The sequence of tandem nucleotide tri- 
plets (codons) in DNA or RNA that specifies the amino acid 
sequence of a protein 


genetics The branch of biology dealing with heredity and 
the laws governing genetic inheritance 


genetic screen A method to search for rare cells using the 
genetic characteristics of the cells 


genome reprogramming The cell’s genome is reprogrammed 
at fertilization, essentially initiating a pluripotent state and 
permitting embryo development to occur 


genomic DNA library A DNA library containing 2 to 3 cop- 
ies of all the DNA fragments from an entire genome 


genomic map A diagram showing the relative positions of 
every gene and other locus indicated along the linear DNA 
molecule contained in each chromosome 


genomics The study of the structure and function of the 
genomes of all organisms 


genotype The DNA sequence characteristics of the alleles 
for all the genes in a specific individual genome 


germ line Reproductive cells that produce egg or sperm 
cells required to generate the next generation (in eukaryotic 
organisms) 


glycosylation Posttranslational modification reaction in 
which enzymes catalyze the addition of sugar units (carbohy- 
drates) to specific amino acids in certain proteins 


guanine (G) One of the five fundamental bases that make 
up DNA and RNA sequences 


haploid Having inherited a single set of chromosomes 


helicase An enzyme that unwinds the DNA double helix at 
each replication fork 


hematopoietic stem cells (HSCs) Stem cells that generate all 
of the different types of specialized cells in the mammalian 
blood and immune systems 


herpesvirus A DNA virus that causes a variety of diseases 
including tumors; the virion contains a double-stranded DNA 
genome and an outer envelope surrounds the nucleocapsid 


heteroduplex A molecule in which an RNA strand is base 
paired to its complementary DNA strand 
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heterosis When a hybrid F1 generation created by cross- 
ing two parental lines (P1 X P2) exhibits desirable traits that 
exceed the traits of both parents 


heterozygous Having inherited two different alleles of the 
same gene on two homologous chromosomes 


histone Special positively charged proteins that bind to 
DNA and create and alter the dynamic structure of eukaryotic 
chromosomes 


hominids Human-like primates that walked upright on two 
feet 

homologous Similar in DNA, RNA or amino acid sequence 
homologous recombination Recombination or genetic 


exchange between two DNA regions that are identical, or 
very similar in DNA sequence 


homozygous Having inherited the same form (allele) of a 
gene on two homologous chromosomes 


homozygous knockout Second-generation transgenic ani- 
mals that carry knockout alleles of the same gene on both 
chromosomes 


hormone Molecules, often short proteins, which carry sig- 
nals to different cells and tissues inside multicellular organ- 
isms 


host cells Living cells invaded by an infectious agent or tar- 
get cells carrying foreign DNA usually introduced into cells 
on a vector 


human artificial chromosomes (HACs) Recombinant mini- 
chromosomes that replicate and segregate just like native 
chromosomes when introduced into human cells 


human immunodeficiency virus (HIV) A retrovirus that 
causes AIDS 


humanized Replacing parts of a foreign protein with the 
equivalent human amino acid sequences 


hybridization Base pairing of single strands of DNA or RNA 
to each other via hydrogen bonding between the complemen- 
tary bases 


hybridomas Hybrid cells made by researchers in which an 
antibody-producing B cell is fused with a myeloma cell to 
form a self-proliferating cell that produces only one specific 
monoclonal antibody 


hybrid vigor When a hybrid F1 generation created by cross- 
ing two parental lines (P1 X P2) exhibits desirable traits that 
exceed those of both parents 


immunology Scientific field that focuses on the study of the 
physical, chemical, and physiological characteristics of the 
components of the human immune system in healthy and dis- 
eased states 


immunosuppressive drugs Agents capable of suppressing 
immune responses in the human body 


induced pluripotent stem cells (iPS cells) Type of pluripo- 
tent stem cell artificially derived by introducing transcription 
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factor genes into highly differentiated adult cells, which trig- 
gers genome reprogramming and a pluripotent state 


inducer Molecule with a regulatory influence on gene 
expression 


inducible Describes a type of promoter or operon that turns 
on gene expression in response to an inducer signal 


informed consent Agreement by a patient to participate in 
an experimental treatment only after the patient understands 
the risks involved 


inheritance The acquisition of characteristics, qualities or 
traits by the transmission of genes from parent to offspring 


initiation factors Proteins that are required to initiate syn- 
thesis of a new polypeptide 


insulin Small protein hormone made by the pancreas cells 
that controls the level of sugar in the blood and does not func- 
tion properly in diabetes 


integrate To insert a DNA strand into a DNA chromosome 
(or plasmid) 


intelligence quotient (IQ) The ratio of a person’s mental age 
to his or her chronological age 


interphase The part of the cell cycle between two succes- 
sive cell divisions, during which cellular metabolism and 
DNA synthesis occur 


interrupted genes 
exon sequences 


DNA genes that contain both intron and 


in vivo Refers to processes in living cells, including the treat- 
ment of diseased cells inside the body 


karyotype The full set of mitotic chromosomes in the 
nucleus that is characteristic for each eukaryotic species; 
human nuclei contain 23 or 46 chromosomes 


keto A specific chemical structural conformation of a DNA 
base 


kilobase pair (kB) 
pairs 


A unit of 1000 consecutive DNA base 


knockout animal A transgenic animal that has had a specific 
gene inactivated or deleted from its genome 


knockout gene A gene that has been inactivated or deleted 
from the genome of an organism 


lac operon An inducible operon encompassing three 
genetic loci involved in the uptake and breakdown of lactose 
in E. coli 


lagging strand The DNA synthesized in short single-strands 
called Okazaki fragments during discontinuous DNA replica- 
tion 


lambda (à) A bacteriophage that infects E. coli cells 


leading strand The strand of DNA that is synthesized 
continuously during DNA replication and does not contain 
Okazaki fragments 
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liposome fusion A vesicle used to transfer DNA genes into 
cells by fusing the vesicle with the cell plasma membrane 


locus A specific position or location on a chromosome 
DNA 


logic gates In DNA computers, a series of wells containing 
DNA molecules that respond to input information in speci- 
fic ways depending on the secondary structure of the DNA 
strand 


lysogenic A viral pathway in which the virus genome 
DNA integrates directly into the host chromosome, called a 
prophage, which can be transmitted to daughter cells at each 
cell division, causing the production of new phages through 
the lytic cycle 


macrophage Large, mononuclear, highly phagocytic cells 
derived from monocytes 


major groove The wide groove in each DNA helix (com- 
pared to the narrower, minor groove) 


male sterility In plants, the failure to produce functional 
anthers, preventing sexual reproduction 


malignant tumors Cancer cells that spread from a primary 
tumor to other body location 


markers Polymorphisms and other genetic loci and land- 
marks identified on human chromosome maps 


meiosis The process of cell division by which reproductive 
cells are formed and chromosome number is reduced by half 


mesenchymal stem cells (MSCs) Multipotent adult stem 
cells that can differentiate into a variety of muscle-related cell 


types 


messenger RNA (mRNA) 
made by transcribing genes 


RNA that encodes proteins and is 


metastasis Process in which cancer cells migrate away from 
a primary tumor and move around the body to form second- 
ary cancers at other locations 


methylation Posttranslational modification in which methyl 
groups are added to specific amino acids in proteins to influ- 
ence protein functions 


microarray technology A technique that permits the simul- 
taneous detection of all of the mRNAs transcribed from thou- 
sands of genes in a genome at any one point in time 


micro RNAs (miRNAs) 
made in eukaryotic cells 


Small regulatory RNA molecules 


mini-chromosomes Small chromosome-like DNA-protein 
structures that sometimes form in different types of eukaryotic 
cells 


minor groove The narrower groove in each DNA helix mol- 
ecule 


mitochondrial DNA A double-stranded circular DNA mol- 
ecule that contains only 37 genes and is located within the 
mitochondria in the cytoplasm of eukaryotic cells 


Glossary 


mitosis The process of cell division that results in the forma- 
tion of two daughter cells, without changing the chromosome 
numbers of the cells involved 


mitotic chromosomes Highly compact structures composed 
of histone proteins bound to long linear double-stranded 
chromosome DNA 


model Certain organisms such as the mouse, fruit fly and 
bacteria are the focus of active research and are now very 
well characterized organisms 


monoclonal antibody (mAb) A pure population of identi- 
cal antibody proteins with a unique sequence that recognizes 
only one specific antigen (made by a cell line cloned from a 
single B cell) 


monogenic diseases 
human genes 


Diseases caused by mutations in single 


monomer A protein molecule of relatively low molecular 
weight that can bind to itself and other proteins to form dim- 
ers, trimers, or other protein complexes, sometimes bound to 
DNA or RNA 


motifs Amino acid sequences that are conserved among 
proteins from many different organisms 


MPF (maturation promoting factor or M-phase promoting 
factor) The cyclin-Cdk protein kinase is assembled when 
the cyclin proteins build up in the cell and bind to CdK to 
make the active protein kinase enzyme (MPF) 


multigenic Diseases and disorders caused by interactions 
among the products of more than one mutant gene and often 
several mutant genes 


multiple sequence alignment Comparing one sequence in 
parallel with several other sequences to look for similarities 
and differences at each position in the linear sequence 


multipotent Cells that are capable of differentiating into 
more than one cell type but cannot generate all of the cell 
types needed in the human body 


multiprotein Consisting of several proteins that interact 
together in a complex 


multiregional model One of three theories of the evolution 
of homo erectus into the anatomically modern human (homo 
sapiens) which proposes that the original homo erectus popu- 
lation migrated into Europe and Asia and then homo erectus 
evolved into homo sapiens simultaneously in several different 
geographic regions 


mutagen An agent that induces genetic mutation by chang- 
ing DNA sequence (for example, a “G” changed to a “T”) 


mutagenesis The creation of a genetic mutation by changing 
the DNA sequence 


mutagenic Having the ability to induce genetic mutation by 
changing the DNA sequence 


mutation Change in a DNA (or RNA) sequence 


myc One of the proto-oncogenes that encodes a transcrip- 
tion factor protein that normally regulates the expression of 
several different genes 
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nanotechnology Technology dealing with structures small- 
er than 100 nanometers in size, and involves developing 
materials or devices with novel properties; the technology of 
molecular manufacturing outside the cell, including molecu- 
lar machinery inside the cell 


neomycin A type of antibiotic used in genetic screens 


neomycin resistance gene (neo') A gene often contained in 
transgenic vectors used to produce transgenic animals; the 
neo" enzyme inactivates neomycin and G418 and renders the 
cells resistant to these antibiotics 


neural progenitor cells Another name for neural stem cells 


neural stem cells (NSCs) Multipotent adult stem cells that 
generate the specialized cells required for the function of the 
nervous system and brain 


neurospheres Multicellular structures generated by neural 
stem cells growing in tissue culture plates 


neurotransmitter Molecule that carries signals across the 
synapses, the gaps between the ends of the nerve cells 


noncoding RNA (ncRNA) 
encode a protein 


RNAs made in cells that do not 


non-protein-coding DNA (non-coding DNA) The strand of 
DNA that does not carry the information necessary to make 
proteins 


nuclear DNA Found in the nuclei of eukaryotic cells; for 
example, each human cell nucleus contains 23 or 46 chro- 
mosomes 


nucleoid A DNA-containing region lacking a surrounding 
membrane, found in prokaryotes 


nucleotides The monomer component of a nucleic acid 
(DNA and RNA), consisting of a pentose sugar plus a base 
and a phosphate group 


nucleus This spherical compartment in eukaryotic cells is 
enclosed within a double membrane (nuclear envelope) and 
contains all the chromosomes and one or more nucleoli (site 
of ribosome assembly) 


obesity Excessive accumulation of fat in the body, partly in 
response to gene expression 


Okazaki fragments The short single-stranded DNA synthe- 
sized on the lagging strand during DNA replication 


oncogene The mutant form of a gene that promotes cancer 
development 


one gene-one enzyme hypothesis The early idea that each 
gene in a cell can produce only one specific protein or cel- 
lular enzyme 


operator Site on DNA where a repressor protein recognizes 
and binds to a specific DNA sequence 


operons Groups of prokaryotic genes that are transcribed 
together into a single polycistronic mRNA that is translated 
into two or more proteins 
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organelle Membrane-bound compartments with distinc- 
tive morphology and function present in the cytoplasm of all 
eukaryotic cells 


organic farming The process of producing food naturally, 
without the use of manufactured chemicals for pest and weed 
control and fertilizers 


organic food Foods grown without the use of chemicals, 
including pesticides and fertilizers; also meat from animals 
raised without hormones and other drugs 


origin of replication (ori) Site on a DNA molecule where 
DNA replication begins (initiation) 


pair-wise sequence alignment Comparing one DNA, RNA 
or amino acid sequence directly with one other DNA, RNA 
or amino acid sequence 


palindrome A double-stranded DNA sequence that reads 
the same on the top strand (5' to 3’) as the complementary 
bottom strand sequence read backwards (5’ to 3’). For exam- 
ple, ACCTAGGT and its complement, TGGATCCA, represent 
a palindrome 


patient-specific Having to do with the needs of an individ- 
ual patient 


phagocytosis The engulfing of microorganisms, other cells 
and foreign particles by specialized cells called phagocytes 


pharming The development and marketing of transgenic 
animals and plants to produce novel therapeutic proteins and 
commercial products 


phenotype Observable characteristics of an individual as 
directed by the expression of the individual’s genes 


phosphodiester bond The covalent chemical bond that links 
nucleotides in a nucleic acid polymer and consists of a central 
phosphate group esterified to flanking sugar hydroxy! groups 


phospholipid cell membrane All biological membranes 
contain phospholipids 


phosphorylation Posttranslational modification reaction in 
which special enzymes add phosphate groups to selected 
proteins 


photosynthesis The process by which plants use the energy 
of sunlight to produce carbohydrates from carbon dioxide 


phylogenetics The study of evolutionary relationships 
among and between organisms and the involvement of 
genetic variation in these changes 


phylogenetic trees Diagrams showing the evolutionary rela- 
tionships among different organisms over time 


pilus A thin tube made in certain bacterial cells and used to 
pass the bacterial chromosome DNA from the donor cell to 
the recipient cell during conjugation (mating) 


placenta The tissue that joins the mother and embryo or 
fetus; allowing diffusion of nutrients from the mother’s blood 
into the fetus’s blood and diffusion of waste products from the 
fetus back to the mother 
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plant protoplast cells Plant cells that have had their cell 
wall completely or partially removed by either mechanical or 
enzymatic means 


plaque A clear circular zone that forms in a lawn of bacteria 
growing on a plate when a virus destroys the lawn of host 
cells 


plasmids Self-replicating double-stranded circular DNA 
molecules that are most often found in prokaryotic cells and 
also in some in eukaryotic cells 


ploidy Chromosome number specific to each type of cell 
and influenced by the stage of the cell cycle 


pluripotent The ability to develop into all of the types of 
cells in the human body 


pluripotent stem cells These cells exist in an undifferenti- 
ated state and have the potential to develop into the many 
types of specialized cells needed in the human body 


polarity In DNA this refers to the two chemically different 
ends of each DNA strand called 3’ and 5’; in the helix, the 
two DNA strands are arranged in opposite directions: 5’ to 
3’ and 3’ to 5’. In cell development, polarity can refer to the 
different biological activities taking place at the different ends 
(poles) of the cells 


pollen Tiny fine spores containing male gametes; for exam- 
ple, pollen carried in the anthers of a flowering plant 


polycistronic transcript An mRNA transcript that encodes 
multiple genes and is translated into multiple proteins in bac- 
terial cells 


polyclonal A population of different antibody proteins that 
specifically recognize different epitopes of the same antigen 


polymerase chain reaction (PCR) Method used to amplify a 
DNA sequence by repeated cycles of DNA strand separation 
and replication 


polymorphisms Locations in the DNA genome where the 
DNA sequences differ between different individuals 


posttranscriptional processing The processing of precursor 
(or primary) RNA transcripts to generate mature messenger 
RNAs (mRNAs) required for translation in eukaryotic cells 


posttranslational modifications Chemical modifications 
(phosphorylation, methylation, glycosylation) added by 
enzymes to selected proteins in eukaryotic cells 


precursor mRNAs Primary RNA transcripts that are copied 
from a gene that are processed into mature mRNAs 


primase Enzyme that participates in DNA replication by 
making RNA primers needed to initiate DNA replication 


primer A short RNA (or DNA) strand that is base paired to 
the template DNA strand at the position on the helix where 
DNA replication will start 


productrule States that the probability a series of independ- 
ent events will happen is equal to the product of the prob- 
abilities of the individual events separately 
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prokaryotic A type of biological cell that lacks a nucleus 
and other organelles (bacteria) 


promoter Region of DNA in front of a gene that interactions 
with the RNA polymerase when gene expression is turned on 


promoter strength Transcriptional control can be measured 
by the rate of transcription initiation in the absence of addi- 
tional gene regulation; strong and weak promoters 


pronuclei The parental male and female nuclei in a ferti- 
lized egg just before nuclear fusion 


protease enzymes Various enzymes that catalyze the 
hydrolytic breakdown of proteins into peptides or single ami- 
no acids 


protein-coding sequence The specific DNA sequence of a 
gene that codes for a specific amino acid sequence 


protein modeling Computer models of protein structures 
are used to predict the shapes of proteins and study function 


proteins Unbranched chains of amino acid units that fold 
into functional 3D structures and perform countless jobs in 
cells 


protein synthesis The production of proteins by the process 
of translation in cells 


proteome The total set of proteins encoded by a specific 
cell’s genome or the total protein complement of an entire 
organism 


proteomics Study of the complete protein complement of 
each organism 


proto-oncogenes Unmutated (wildtype) form of oncogenes 
that encode proteins with important cellular functions 


protoplasts Plant cells without cell walls 


provirus Virus genome copy that is integrated into the chro- 
mosome DNA of the host cell 


pyrimidines Type of nitrogenous base with a single chemi- 
cal ring found in DNA and RNA 


quantitative trait loci (QTL) Specific chromosome regions 
containing several genes that influence a complex trait such 
as crop yield 


query A request for the analysis of a specific DNA, RNA, or 
amino acid sequence submitted to a database 


random match probability (RMP) The probability that a 
DNA profile in question would be found in a person who 
was randomly selected from the same racial/ethnic group to 
which the defendant in a criminal trial belongs 


receptor Protein that binds to another molecule, such as a 
hormone or a nutrient, and participates in cell to cell signaling; 
receptors are located on the outer surface of the cell mem- 
brane 


recessive Phenotypic expression of a genetic allele only in 
homozygous cells that lack wildtype forms of the gene 
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recessive allele Usually encodes a nonfunctional protein 
product 


reciprocal translocation Translocation involving an equal 
exchange of DNA sequences between two chromosomes 


recognition sequence A specific DNA sequence that is rec- 
ognized by a specific protein such as a restriction endonu- 
clease that will cleave a DNA molecule at a specific DNA 
sequence 


recombinant DNA A molecule containing foreign DNA 
sequences covalently linked to vector or host DNA sequences 


recombinant DNA cloning A double-stranded DNA vector 
carrying a foreign gene that is cloned and propagated in bac- 
teria 


recombinant DNA plasmids Circular double-stranded mol- 
ecules containing both vector DNA and a foreign gene(s) 


recombination (crossing-over) DNA exchange creates new 
combinations of genes and genetic information 


reference databases Collections of DNA profiles of many 
people who were not involved in crimes but who volunteered 
to give their DNA to law enforcement agencies to determine 
the frequencies of common marker alleles and genotypes in 
the general population 


regeneration This healing process involves the processes of 
inflammation, proliferation, and tissue remodeling 


rejection A serious complication that often occurs when- 
ever a patient receives a transplant of cells or tissues from a 
person who is a genetically unrelated unmatched donor 


release factors Proteins that recognize a stop codon during 
translation and cause the finished polypeptide to be released 
from the ribosome 


repeated DNA sequences Stretches of DNA bases that are 
repeated in many locations throughout a genome 


replica plating A technique in which one or more second- 
ary petri plates containing different types of solid (agar-based) 
selective growth media are inoculated with the same colo- 
nies of microorganisms from a primary plate (or master dish); 
a sterile cloth is pressed gently onto the colonies on the first 
plate and then the cloth is pressed onto an empty agar plate to 
reproduce the original spatial pattern of colonies 


replication fork The region of replicating DNA that encom- 
passes half of a replication bubble; the replicating fork con- 
tains replicating DNA and the enzymes and other proteins 
involved in DNA synthesis 


replication origin A specific DNA sequence can be involved 
in the initiation of DNA replication 


repressor proteins Regulatory proteins that repress expres- 
sion and prevent a gene from being transcribed into RNA 


reproductive cloning The production of genetically identi- 
cal animals using somatic cell nuclear transfer methods 


restriction fragment length polymorphisms (RFLPs) Dif- 
ferences in the DNA sequences of restriction enzyme 
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cleavage sites in an individual’s genome can produce DNA 
fragments of different lengths when the individual’s genome 
is cut by that restriction enzyme; these differences are used to 
map genes and identify people 


result The information obtained from a query to a database 
or the data obtained from a scientific experiment 


retrovirus RNA genomes The viruses infect cells by revers- 
ing the usual flow of genetic information in the cell; the viral 
RNA is copied into DNA which is integrated into the chromo- 
some DNA (provirus) 


reverse transcriptase A protein enzyme that copies single- 
stranded RNA into double-stranded DNA 


ribonuclease 
the cell 


ribosomal RNA (rRNA) RNA molecules that bind to ribos- 
ome proteins and have an essential part of the structure and 
function of a ribosome 


An enzyme that degrades RNA molecules in 


ribosomes Very large multiprotein-rRNA complexes that 
perform protein synthesis (translation) in the cell 


ribozyme An enzyme that is composed of protein and RNA 
components in an RNA-protein complex 


RNAi gene silencing A research technique in which the 
expression of a gene in live cells can be turned off when 
desired 


RNA polymerase Enzyme that copies DNA templates into 
RNA strands (transcription) 


RNA polymerase I Eukaryotic RNA polymerase that tran- 
scribes the DNA genes encoding the large ribosomal RNAs 


RNA polymerase II Eukaryotic RNA polymerase that tran- 
scribes the DNA genes encoding proteins in the cell (also 
called structural genes/proteins) 


RNA polymerase III Eukaryotic RNA polymerase that tran- 
scribes the genes for 5S ribosomal RNA and the transfer RNA 
genes 


Score (S) The numerical value assigned to each match 
resulting from a BLAST database search (the higher the score, 
the better the match) 


segregated When chromosomes move to opposite poles 
during cell division 


selectable marker gene A gene that kills the cells that have 
not taken up a plasmid and allows only cells carrying the 
plasmid to grow (a selection method) 


semiconservative Mode of DNA replication in which each 
daughter DNA molecule contains one of the two original 
DNA strands and one new complementary DNA strand 


senescence The process of growing old for organisms and 
individual cells 


sequence alignment A way of arranging DNA, RNA, or 
protein sequences to identify regions of sequence similarity 
that might be of functional, structural, or evolutionary signifi- 
cance 
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serum Blood plasma containing clotting factor proteins 


sexually propagated The reproduction of plants using seeds 
or spores 


Shine-Dalgarno sequence A conserved sequence near the 
start of MRNA that is recognized and bound by a rRNA in the 
prokaryotic ribosomes 


short tandem repeat (STR) Subset of VNTR (variable number 
tandem repeats) made up of short repeated DNA sequences 


shuttle vectors Vectors that can replicate in more than one 
type of cell and can transfer or shuttle between the two organ- 
isms as desired 


signal sequence Hydrophobic amino acid residues at the 
amino terminus of secretory or integrated membrane pro- 
teins that function to direct the protein to the appropriate cell 
membranes 


single nucleotide polymorphisms (SNPs) Single base pair 
differences between two individual genomes 


small interfering RNAs (siRNAs) Short RNA molecules 
involved in controlling gene expression 


spliced To form new genetic combinations by intron remov- 
al from precursor RNA to connect the exons and produce the 
mature MRNA 


spliceosome A ribonucleoprotein complex that removes the 
introns from precursor MRNAs and joins the exons together to 
form functional mRNA 


spontaneous mutation A change in DNA sequence usually 
due to errors in the normal functioning of cellular enzymes 


Src The first oncogene to be identified and studied in the 
laboratory 


stem cell niche The immediate environment surrounding 
the locations of stem cells in the body 


stringency Conditions that influence the base pairing 
(hybridization) interactions between complementary nucle- 
otide sequences 


structural genes 
proteins in the cell 


DNA sequences that code for structural 


sustainable agriculture A system of farming that takes into 
account the environmental, social, and economic issues relat- 
ing to the growth and distribution of crops 


synapse The gap in the junction between the ends of two 
nerve cells 


syndactyly The most common congenital anomaly of the 
hand, marked by the persistence of webbing between fingers 
and toes after birth 


TATA box DNA binding site for a transcription factor that 
guides RNA polymerase II to the promoter region of eukaryo- 
tic genes 


telomere Special repeated sequences located at the ends of 
linear eukaryotic chromosomes that are required to replicate 
the DNA at the ends of the chromosomes 


Glossary 


teosinte A tall annual grass that is closely related to and pos- 
sibly the ancestor of Indian corn 


termination signal A specific DNA sequence that signals the 
release of both RNA polymerase and the newly made RNA 
transcript from the DNA template 


tetranucleotide repeat A type of short tandem repeat in 
which a four-base pair sequence is repeated 5 to 50 times in 
the human genome; repeats are used for most forensic DNA 
testing methods 


the most recent common ancestor (TMRCA) The amount of 
time or number of generations since individuals have shared 
a common ancestor 


therapeutic cloning Cloning of an embryo for the purpose 
of deriving embryonic stem cells for therapeutic treatments 


thymine (T) A pyrimidine base found in DNA but not in 
RNA 


thymidine kinase gene (tk) A gene often carried in trans- 
genic vectors used in mammalian cells coding for an enzyme 
that phosphorylates and inactivates the nucleoside analog 
ganciclovir 


traits Genetically determined characteristics and pheno- 
types exhibited by an individual 


transcription Process by which genetic information in DNA 
is copied into an RNA transcript 


transcriptionally silent Region of a genome in which no 
transcription occurs (gene expression is off) 


transcription factor Protein that regulates gene expression 
by binding to DNA in the control regions of the gene and 
interacting with RNA polymerase enzyme 


transcription initiation The initiation of transcription 
events by interactions between proteins, promoters, and RNA 
polymerase 


transduction The transfer of genetic information from one 
bacterium to another as a result of a bacteriophage carrying 
bacterial chromosome DNA 


transfecting The process of transferring DNA (usually of a 
gene) into a cultured cell using a virus-based vector 


transfection A process in which virus-based vectors are 
used to transfer recombinant DNA into cells 


transfer RNA (tRNA) Short RNA molecules that carry amino 
acids to the ribosome and help to translate the genetic code 
during protein synthesis 


transformation Process in which genes are transferred into 
bacterial or mammalian cells as foreign DNA carried in a 
plasmid-based vector 


transforming substance In early experiments, DNA was 
found to change (transform) the biochemistry of bacterial 
cells 


transgene The foreign gene that is inserted into a plant or 
animal of interest using genetic engineering and recombinant 
DNA methods 
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transgenic animals Animals carrying foreign DNA transgene 
integrated into their genomes 


transgenic plants 
gene (transgene) 


Plants containing an integrated foreign 


translocations Breaking the DNA helix in one chromosome 
and attaching it to a different DNA helix in a completely dif- 
ferent chromosome 


trisomy A condition in which an extra copy of an entire 
chromosome is inherited by an individual’s cells (for example, 
Trisomy 21 is Down Syndrome) 


trophectoderm The outer layer of a developing embryo after 
the differentiation of the ectoderm, endoderm, and mesoderm 
layers 


tumor suppressor genes Genes that act to prevent unwant- 
ed cell division and as a result they suppress the development 
of cancer cells 


undifferentiated cells Cells that have not yet become spe- 
cialized or exhibit the morphological and functional charac- 
teristics they will acquire upon differentiation 


uniregional model One of three theories of the evolution of 
homo erectus into the anatomically modern human (homo 
sapiens) which proposes that recent African migrants did not 
interbreed with hominids they encountered in Europe and 
Asia but, rather, these other hominid groups became extinct 
and the African migrants replaced them 
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upstream Regions of the DNA genome located to the 5’ side 
of each gene (before the start of the transcribed region 


uracil (U) A pyrimidine base found in RNA but not in DNA 


vectors DNA molecules modified for use as carriers to trans- 
port foreign genes 


virus packaging extract An extract from infected cells that 
rapidly packages lambda (A) vector DNA, along with any 
inserted foreign DNA, into virus particles that transfer the 
DNA into the bacterial cells 


xenotransplantation Transplanting organs between species, 
such as organs transplanted from animals into humans to treat 
organ failure 


X-linked gene A gene located on the X chromosome 


x-ray diffraction A process in which crystallized molecules 
are rotated and bombarded with x-rays to determine struc- 
tural information about the molecule 


yeast artificial chromosome (YAC) A synthetic chromosome 
vector made from yeast genome DNA that can carry large 
inserts of foreign DNA and replicate in yeast cells as normal 
linear chromosomes 


zygote The cell resulting from the union of a male and a 
female gamete during fertilization 


zymogen A protein that requires a chemical change for the 
molecule to become an active enzyme 
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